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Annette Klosa-Kiickelhaus, Ilan Kernerman 
Lexicography of Coronavirus-related 
neologisms: An introduction 


1 Background 


This volume of Lexicographica. Series Maior focuses on lexicographic neology and 
neological lexicography concerning COVID-19 neologisms, featuring papers origi- 
nally presented at the third Globalex Workshop on Lexicography and Neology 
(GWLN 2021). GWLN 2021 was held online in conjunction with Australex 2021,? 
with a focus on neologisms arising in relation to the COVID-19 pandemic. Papers 
discussing various issues related to the detection of such neologisms — including 
new words, new meanings of existing words, and new multiword units — and their 
representation in lexicography and dictionaries were invited to offer cross-world 
views on lexicographic detection and representation of Coronavirus-driven neolo- 
gisms for different languages. Similar challenges regarding COVID-19 neologisms 
and lexicography arise for any contemporary language, for example how to detect 
such neologisms (corpus analysis and editorial means of identification, evaluation 
of other data, e.g. blogs and chats) or how dictionary users can help with finding 
and informing about them. But also the extent of borrowing COVID-19 neologisms 
from other languages (and which ones), in contrast to the use of word formation 
processes to enlarge the Coronavirus-related vocabulary in a specific language, 
needs to be examined, and questions of prescriptive vs. descriptive lexicographic 
information on such neologisms need to be addressed. 

The GWLN series began as a single event conjugated with the 22nd Biennial 
Meeting of the Dictionary Society of North America (DSNA) at Indiana University, 
Bloomington, in 2019? and included thirteen invited papers from around the world, 
of which eight formed a special issue of the DSNA’s journal Dictionaries, published 
the following year (2020, 41.1%). GWLN-2º was planned in conjunction with the Eur- 
alex 2020 Congress (Alexandropoulos, Greece), but due to the COVID-19 pandemic 


1 https://globalex2021.globalex.link/ (last access: 10 June 2022). 

2 https://www.adelaide.edu.au/australex/ (last access: 10 June 2022). 

3 https://dictionarysociety.com/ (last access: 10 June 2022). 

4 https: //dictionarysociety.com/wp-content/uploads/2020/05/Dictionaries-41.1-Table-of-Contents. 
pdf (last access: 10 June 2022). 

5 https://globalex2020.globalex.link/gw-euralex2020 (last access: 10 June 2022). 
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it was partially held online (November 2020)° and as a special session at Euralex 
2020 online (in 2021),” with selected papers published as a special issue of Interna- 
tional Journal of Lexicography (Klosa-Kückelhaus/Kernerman 2021). 

Lexicography has been around for thousands of years and has always had to 
adapt to developments in society and language, apparently more than ever in the 
last generation with its increasingly rapid and radical technological changes. Neol- 
ogy has been there forever, driving language from the start and so-to-speak inciting 
lexicography. Likewise, in recent decades neology has been drawing more attention 
in research communities and inspiring new practical applications, such as at uni- 
versity or national language observatories or in the language technology industry, 
as well as with the general public. The speed of novelty in daily life accelerates and 
the volume of innovations grows exponentially — all defined by language as well as 
affected by and affecting language. Altogether, there is greater interest in neolo- 
gisms and in the role of lexicographic resources to capture and disseminate them to 
the world. 

The overall aim of GWLN and its corresponding publications is to explore this 
intersection of neology and lexicography worldwide, uncover the common factors 
and highlight individual features, expose and share the findings with each other 
and enhance mutual understanding, professional competence, and user satisfac- 
tion. The main issues in question begin with the identification of neologisms and 
go on to comprise their categorization and lexicographic treatment and representa- 
tion. As such, the description in our introduction to the special issue of Dictionaries 
(Klosa-Kiickelhaus/Kernerman 2020) is appropriate here, too, and we reproduce it 
with slight adjustments: 

“Neology constitutes a natural, dynamic and multilateral part of all living human 
languages, whether as a reflection or for facilitation of linguistic communication, and 
lexicographic interest in neologisms is at least as old as dictionaries themselves. There 
is a vast field of research of neologisms, pertaining to their origin (stemming from the 
given language as in new word formation, or loan words from other languages includ- 
ing the dominance of English today, as well as combining both), distribution (in gen- 
eral language and in domain-specific language, that is terminology), identification 
(applying corpus linguistics methods, editorial methods, user generated candidates, 
and comparison of different methods), evaluation (such as in blogs and chats), and 
more. The general definition of neologisms as applied here refers to new words, new 
multiword units, new elements of word formation, and new meanings of either of 
them, and addresses lexicography-driven or -oriented aspects, including: 


6 For the program, see https://globalex2020.globalex.link/globalex2020-online/ (last access: 10 June 
2022). 

7 https://euralex2020.gr/ (last access: 10 June 2022). 

8 https://academic.oup.com/ijl/issue/34/3 (last access: 10 June 2022). 
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— How to interoperate lexicographic datasets with online resources and incorpo- 
rate neologisms into dictionaries (the media, formatting, labelling, etc.) 

- How to deal with grammatical/orthographic/pronunciation variation (descrip- 
tive vs. prescriptive approaches) 

- How to explain meaning with/without encyclopaedic information, and how to 
use illustrations and audio-visual media 

— How well are neologisms that are integrated in dictionaries accepted by the 
community (issues of rejection of new words and language purism) 

— How differently, if at all, should neologisms be treated in different dictionary 
types (e.g. in historical comprehensive ones as opposed to those focusing on 
current usage; in monolingual vs. bilingual dictionaries; in special dictionaries 
of neologisms; in special domain dictionaries) 

— How to deal with neologisms that are no longer new and with those no longer used 

— How can dictionary users help with finding and informing about neologisms 


The papers in this volume pursue the discussion on some of these aspects, presenting 
state-of-the-art research into neology [specific to the COVID-19 pandemic] and ideas 
on modern lexicographic treatment of neologisms in various dictionary types.” 


2 This volume 


The thirteen papers in this volume focus on ten languages: one Altaic (Korean), one 
Finno-Ugric (Hungarian), two Germanic (English and German), four Romance (French, 
Italian, [Brazilian and European] Portuguese and [Pan-American and European] Span- 
ish), and one Slavic (Croatian), as well as the Sign Language of New Zealand. Special- 
ized dictionaries of neologisms are discussed as well as general language ones, 
monolingual, bilingual and multilingual lexical resources, print and electronic dictio- 
naries. Questions regarding terminology as well as general language and standard 
and norm regarding COVID-19 neologisms are raised and different methods of detect- 
ing candidates in media corpora, as well as by user contributions, are discussed. 

The papers are broadly arranged in four groups of three (and four) papers each. 
The first group features papers regarding English, German, and Korean, respec- 
tively, evolving from systemic neological and lexicographic research carried out in 
their authors’ institutions for some years, which conveys solid support and wide 
perspectives to their findings. The second consists of three papers regarding Span- 
ish neologisms in traditional and upcoming lexicographic contexts from Europe 
and Latin America. The third presents work on Croatian, Hungarian, Italian, and 
Portuguese in Portugal and Brazil, i.e. to some extent lesser used languages, which 
is no less pertinent as for dealing with similar issues. The fourth group of papers 
extends beyond mainstream lexicography to study COVID-19 neology in relation to 
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collaborative editing in Wiktionary, to terminology, and to New Zealand sign lan- 
guage. Together, this collection offers rich insights that sometimes overlap while 
remaining unique. 

In The Oxford English Dictionary and the language of Covid-19, Danica Sala- 
zar and Kate Wild offer insight into how the editorial team working on this re- 
nowned historical dictionary of English reacted to the challenges posed by the 
rapid expansion of new vocabulary during the Coronavirus pandemic: “The lexical 
adaptation necessitated by this global health crisis has been unprecedented in 
speed and scope, and in response, the Oxford English Dictionary (OED) has continu- 
ally revised its coverage, publishing special updates of Covid-19-related words in 
2020 outside of its usual quarterly publication cycle.” The Oxford Languages” moni- 
tor corpus of English and other text databases were used to monitor the develop- 
ment of pandemic-related words, and the authors describe how new lexemes (most 
prominently COVID-19) and words with new meanings (e.g. bubble) or new signifi- 
cance (e.g. social distancing) were detected and treated lexicographically, for exam- 
ple by revising existing entries and adding new ones. Questions of how the use of 
terminology in general discourse and regional variation should be transferred into 
lexicographic information are discussed as well. 

Finally, the authors explain how their work expanded beyond the dictionary it- 
self: “The OED's efforts to document the lexical change brought to the English lan- 
guage by the coronavirus pandemic continued throughout 2020, culminating with 
the Words of an Unprecedented Year report, which was published at the end of 
the year in place of the usual selection of a single Word of the Year (Oxford Lan- 
guages 2020). This expansive report on the words that defined 2020 features an entire 
section dedicated to the language of Covid-19.” Many dictionary projects around the 
world reacted in a similar manner and started to publish texts on the COVID-19 vo- 
cabulary addressed to the public, as other papers in this volume show. 

While the OED as a comprehensive dictionary on general language will only in- 
clude some highly frequent new lexemes or new meanings into its content, special- 
ized dictionaries on neologisms can be more generous when it comes to the number 
of new entries. In the paper titled German Corona-related neologisms and their lexi- 
cographic representation, Annette Klosa-Kiickelhaus discusses this question and 
contrasts two different perspectives: “There are some (neologism) dictionaries that 
only record neologisms retrospectively, that is after their lexicalization. [. . .] Other 
neologism dictionaries |. . .] record neologisms [. . .] before they are fully lexicalized, 
but are nevertheless accepted parts of the lexicon.” Presenting data on an online ne- 
ologism dictionary published by the Leibniz Institute for the German Language (IDS), 
the author demonstrates how both approaches are combined in one project so that 
dictionary users may find information on COVID-19 neologisms (new lexemes, 
new meanings, and new usages) as soon as possible throughout the pandemic de- 
velopment. She also discusses how to detect candidates for inclusion, for example 
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by continuously evaluating user contributions via a word proposal form on the 
dictionary webpage. 

Overall, this dictionary project seems to have profited from the challenges 
posed by the rapid vocabulary expansion throughout the pandemic, as “the general 
awareness for the lexicographic work at IDS and for the usefulness of reliable, up- 
to-date dictionaries” was raised, “making it worthwhile to immerse in lexicography 
“at the pulse of time”.” 

The emergence and spread of Korean COVID-19 neologisms in news ar- 
ticles and user comments and their lexicographic description is the topic of the 
paper by Kilim Nam, Jinsan An and Hae-Yung Jung, in which they examine the oc- 
currence frequencies and usage trends of COVID-19 neologisms in news articles and 
user comments related to the pandemic, to provide information on Korean neolo- 
gism usage across genre. As “COVID-19 neologisms, in particular, have proliferated 
for the past year or so, to express, describe, and comment on a global phenomenon, 
constituting an unprecedented case of profuse and multifaceted neological creativ- 
ity centered on a single topic”, they lend themselves especially well to analyze the 
differences in distribution and trends across genres. By carrying out secondary col- 
locate and n-gram analyses in addition to frequency and primary collocate analy- 
ses, the authors collect data providing a better understanding of the use context for 
neologisms, in a case study of the neologism K-quarantine. Finally, they propose a 
microstructural model for COVID-19 neologisms that integrates the findings of the 
study, taking the neologisms Wuhan Pneumonia and K-quarantine as examples. 

The results presented in this paper show that comment data prove invaluable for 
lexicographic description of neologisms: “The value of comment data in lexicographic 
description ultimately lies in the pragmatic information and the socio-cultural back- 
ground it provides on headwords and which are not easily seen in existing dictionar- 
ies. Moreover, unlike articles, comments are produced by a multitude of commenters 
and reflect their emotions and stances in relation to the relevant neologisms, provid- 
ing dictionary users and future generations with fresh, raw examples of real-life lan- 
guage for neologism headwords.” The authors concede, though, that experts need to 
decide to what extent the politically incorrectness of commenters” language may be 
used in dictionaries. 

Moving to the lexicographic description of COVID-19 neologisms in Spanish, it 
becomes evident that the question of a lexeme losing its neologism status by being 
included in general dictionaries also needs to be discussed. In their paper Lexico- 
graphic detection and representation of Spanish neologisms in the COVID-19 
pandemic, Pedro J. Bueno and Judit Freixa “address the neological process and |. . .] 
reflect on the various stages of it, from the time a neologism is born until the moment 
it ceases to be one because it has been dictionarised” (i.e. incorporated into a dictio- 
nary). Based on their definition of “pandemic neologisms” and the neological process, 
the authors give information on their corpus data and data analysis methods be- 
fore presenting three different groups of COVID-19 neologisms: “non-dictionarisable 
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neologisms”, “neologisms in the antechamber of dictionarization”, and “dictionaris- 
able neologisms”. They also discuss how some of the neologisms found in their study 
have recently been added to the Diccionario de la lengua española (DLE), the authori- 
tative Spanish language dictionary published by the Royal Spanish Academy with 
participation of the Association of Academies of the Spanish Language. 

The authors point out that the inclusion of neologisms in dictionaries is “dual 
property acting on a two-fold plane: that of consolidation in use on the one hand, 
and that of the criteria governing the elaboration of dictionaries on the other”. 
Some thoughts also go into the different categories that neologisms fall in: short- 
lived, fleeting ones, and those staying on and becoming fully lexicalized and ac- 
cordingly recorded in general language dictionaries. 

Andreína Adelstein and Victoria de los Ángeles Boschiroli take this discussion 
a step further in their paper Spanish neologisms during the COVID-19 pan- 
demic: Changing criteria for their inclusion and representation in dictionar- 
ies, by looking not only into the inclusion of COVID-19 neologisms in (synchronic 
and historical) general language dictionaries of Spanish, but also into a bilingual 
English-Spanish dictionary and a Spanish neologism dictionary aiming to cover 
geolectal variants in six Spanish-speaking countries in Pan-America. The authors 
describe the different criteria used in the process of inclusion and treatment of the 
lexemes in those dictionaries starting their study with data obtained by the Antenas 
Neológicas Network, which are “collected exclusively from the written press of the 
six countries that make up the network”. They concede that “this may be regarded 
as a limitation in terms of diaphasic variation in relation to pandemic vocabulary, 
but on the other hand, it guarantees a certain degree of institutionalization, which 
is an essential aspect when considering the inclusion of new words in a general lan- 
guage dictionary.” 

By comparing how different types of dictionaries include/exclude COVID-19 
neologisms, they find that there is “a certain degree of overlap of some features 
which are traditionally thought to be specific to each type of dictionary: [. . .] Dic- 
tionaries which, unlike dictionaries of neologisms (which make no claim to finality 
of stability regarding the place in the language of the items collected), are not re- 
stricted to these phenomena or not supposed to collect them, ended up recording 
ephemeral or witness items, with a very low or null frequency of use.” 

In a third perspective on the Spanish language, Magdalena Coll and Mario Bar- 
ité focus on the inclusion of technical COVID-19 neologisms into a general language 
dictionary of Spanish in their paper Specialized voices in the 23rd edition of the 
Diccionario de la lengua española: Analysis of the COVID-19 field and its neo- 
logisms. By analyzing the lexicographic treatment of specialized language neolo- 
gisms as well as new words beginning with CORONA-, they assess the particularities 
of the dictionaries in question regarding the incorporation of the new words, as 
well as the degree of correspondence or complementarity between the last two edi- 
tions of DLE. The authors demonstrate how “the new additions open up a debate 
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on the treatment of neologisms in academic lexicography, in a particularly unique 
scenario”. 

Here again, the rapid vocabulary expansion and its subsequent lexicographic 
treatment throughout the COVID-19 pandemic is seen as an “opportunity for lexi- 
cography and terminology researchers”, who should “discuss and propose consis- 
tent solutions for the incorporation of scientific and specialized words into DLE and 
other Spanish dictionaries” and “leave behind vague criteria for incorporating or 
excluding scientific terms, scientific definitions not easily understood by a regular 
audience, conceptual inaccuracies, and somewhat erratic assignments of thematic 
labels”. 

In the first paper of the next group in this volume, How the COVID-19 pan- 
demic is changing the Hungarian language: Building a domain-specific Hun- 
garian/Italian/English dictionary of the COVID-19 pandemic, Judit Papp looks 
into ways of compiling a trilingual online dictionary with COVID-19 neologisms 
using different corpus and dictionary writing tools: “With the creation of the dictio- 
nary, my aim is to fill |. . . the] lexicographic gap primarily concerning the Hungar- 
ian-Italian language pair and to organize this content in a free online tool (a rich 
database) that is easy to search and useful for linguists and translators. The third 
language is English, as the comparison with it is inevitable. [. . .] papers, findings, 
and results of scientists’ experiments relating to COVID-19 are published in English 
and this means that English plays an important role in the creation of neologisms. 
In both Hungarian and Italian, we record a certain number of loans, calques, and 
adaptations”. 

Here, again, the author interprets the high number of COVID-19 neologisms as 
a sign for the creativity and vitality of a language (namely Hungarian), and dis- 
cusses how these aspects affect the lexicographic description in an online dictio- 
nary (here a trilingual dictionary of equivalents). 

Questions of standardization not only arise regarding terminology, but also in 
connection with general language. In their paper Coronavirus-related neologisms: 
A challenge for Croatian standardology and lexicography, Milica Mihaljevié, 
Lana Hudecek and Kristian Lewis discuss which COVID-19 neologisms collected from 
media corpora and online sources should become part of general language dictionar- 
ies. They distinguish between Croatian neologisms (single and multiword units) and 
loanwords and loan translations and stress the importance of responding with pre- 
scriptive information in their dictionary to the high number of user questions (regard- 
ing orthography, morphology, word formation, usage in a sentence and, last but not 
least, meaning) concerning all types of neologisms. 

Their starting point for the lexicographic description of COVID-19 neologisms 
was the Glossary of Coronavirus compiled by a small group of lexicographers with a 
clearly descriptive intention: “The purpose of the Glossary was to meet the needs of 
Croatian speakers as soon as possible. It usually records terms as they are used and 
does not give any normative advice. It includes jargon words as well as scientific 
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terms which entered the general language”. Entries in this glossary were then sys- 
tematically searched in those corpora that are the basis for the Croatian Web Dictio- 
nary — MreZnik, a normative dictionary. The comparison between the differences in 
the prescriptive vs. the normative approach is informative for other dictionary proj- 
ects as well. 

In Sílvia Barbosa and Susana Duarte Martins’ paper The neologisms of the 
COVID-19 pandemic in European Portuguese: From media to dictionary, we 
learn about the occurrence of COVID-19 neologisms in the press and social net- 
works and whether and how European Portuguese dictionaries have incorporated 
them. The authors focus on four candidates: COVID-19, coronavirus, pandemia, and 
tele-, and demonstrate with many examples how these are incorporated into new 
morphological formations, illustrating how vital the lexical neology process in the 
domain of COVID-19 in a rather short period of time (2020/2021) actually was. 

This study also sheds light on how online dictionaries find different ways of re- 
acting to sudden vocabulary expansion, but also on how the Portuguese language 
was adapted to the new situation by all its speakers. The authors state (what is true 
also for many of the examples given in other chapters in this volume): “Only the 
future will tell whether the creative linguistic phenomenon that emerged from the 
pandemic will persist in the Portuguese language (namely the loss of the neologism 
status of particular units while being incorporated in the current language lexicon) 
or whether it will be a source of occasionalisms circumscribed in time and space 
while the COVID-19 outbreak lasts.” 

leda María Alves, Beatriz Curti-Contessoto, and Lucimara Costa present data on 
Brazilian Portuguese COVID-19 terminology in their paper COVID-19 terminology 
and its dissemination to a non-specialised public in Brazil. Their corpus-based 
study “aims to detect, analyse and discuss the characteristics of COVID-19 terminol- 
ogy, in particular the role of the adjective novo [new] in this terminology, the high 
recurrence of terms in the plural and the resemantisation of some of the terminolog- 
ical units used”. 

Their ultimate goal is to create a “terminological dictionary aimed at non- 
specialised readers in the medical field with little formal education”, in which the 
terms will be presented onomasiologically. As the intended user group comprises a 
high percentage of functionally illiterate people, the terms will be defined using 
plain language. The paper exemplifies the manifold lexicographic problems arising 
when dealing with new terms from the Coronavirus pandemic in such a setting. 

In their paper Neoterm or neologism? A closer look at the determinologisation 
process, Rute Costa, Margarida Ramos, Ana Salgado, Sara Carvalho, Bruno Almeida, 
and Raquel Silva focus on new lexical units in the Portuguese media discourse and 
their formation, categorization, and lexicographic description. Especially words formed 
with covid- are collected and analyzed regarding the question “whether these words 
can be considered neoterms or, on the contrary, if having a term in their formation 
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corresponds to a false neological intuition. In the latter case, rather than a neoterm, we 
have a neologism resulting from a process of determinologisation.” 

The authors also discuss several issues regarding the inclusion of such “neo- 
terms” in dictionaries, for example their definition and which domain label should 
be used. In a template proposal for a lexicographic entry, the authors present their 
ideas and reflect on the “dictionary as a language model,” giving “descriptive guid- 
ance” to its users. 

In some ways different, but also comparable, problems arise for Sign Language 
and its lexicographic description, as shown by Mireille Vale and Rachel McKee in 
their paper Neologisms in New Zealand Sign Language: A case study of COVID- 
19 pandemic-related signs. New signs for suddenly very frequently used new termi- 
nology regarding COVID-19 had to be created, conventionalized and disseminated 
throughout the community of Deaf people in New Zealand. The authors also aim “to 
explore how and when such neologisms could be entered in the ODNZSL” (Online 
Dictionary of New Zealand Sign Language). The data on signs related to COVID-19 was 
collected from two sources: signs that were contributed to NZSL Share, a web-based 
platform where users can upload sign videos etc., and signs used by interpreters (e.g. 
while translating TV briefings on the Corona pandemic). To form the new signs, dif- 
ferent strategies were used including “semantic extension; coinage of new words 
through language-internal mechanisms such as derivation or compounding; and 
drawing on language-external resources, as calques or direct loans”. 

Regarding the lexicographic treatment of such lexical innovations, similar 
problems to conventional language arise, as signs to be included into a dictionary 
should be fixed, used over a longer period of time outside the original context and 
widely throughout the whole Deaf community. Using a crowdsourcing platform like 
NZSL Share seems to be a promising tool to find and spread Sign neologisms that 
then help to update ODNZSL. 

In the closing paper of this volume, Using Wiktionary revision history to un- 
cover lexical innovations related to topical events: Application to Covid-19 
neologisms, Franck Sajous explores how data from current revisions in Wiktionary 
(here demonstrated with the English and the French versions) can be explored to 
find candidates for COVID-19 neologisms for inclusion in other dictionaries (in addi- 
tion to exploring media corpus data), thus enabling lexicographers “to monitor, an- 
alyse and report quickly a sudden inflow of lexical changes”. After explaining his 
methodology (data processing, ranking new and existing headwords, and annota- 
tion of headwords), the author presents his results. Here, readers learn about the 
different contributor types and how existing and new entries are ranked quarterly 
and annually, as well as on false negatives. 

The study is “based on the hypothesis that Wiktionary's most heavily modified 
articles can help detect new and existing headwords that are related to topical 
events”, which could be validated for COVID-19 neologisms, at least regarding the 
English and French Wiktionary version with very active online communities. It 
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remains to be seen, however, whether the method described here will be able to 
detect lexical innovations related to topical events with a smaller impact than the 
Corona pandemic evidently had. 

Overall, the findings of the studies in this volume focus on how lexicographic 
work regarding COVID-19 neologisms has been done and could be improved, either 
by exploring corpora and other data more systematically, by incorporating users” 
expertise into the lexicographic process, or by learning from the lexicographic prac- 
tice of existing dictionaries. Many authors also stress how strongly lexicographic 
work was affected by the COVID-19 pandemic and its repercussions on the vocabu- 
laries of languages around the world in many different ways, but also how, due to 
such challenges, steps were taken to improve lexicographic work. We hope that the 
discussion regarding these and other questions related to lexicography and neology 
in the context of the COVID-19 pandemic and beyond will continue and that this 
volume contributes to it in a fruitful way. 

We would like to express our gratitude to the editorial board of Lexicographica. 
Series Maior, primarily Stefan Schierholz, who quickly accepted our proposal for 
this volume into the series. And we thank the authors, at the heart of this publica- 
tion, for their contributions and trust in us. 
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Danica Salazar, Kate Wild 
The Oxford English Dictionary 
and the language of Covid-19 


1 Introduction 


Since the beginning of 2020, the Covid-19 pandemic has dominated public discourse 
and introduced a wealth of words and expressions to the general vocabulary of En- 
glish and other world languages. The lexical adaptation necessitated by this global 
health crisis has been unprecedented in speed and scope, and in response, the Oxford 
English Dictionary (OED) has continually revised its coverage, publishing special up- 
dates of Covid-19-related words in 2020 outside of its usual quarterly publication 
cycle. This article describes how OED lexicographers have analysed language corpora 
and other text databases to monitor the development of pandemic-related words and 
provide a linguistic and historical context to their usage. 


2 Neologisms of the Covid-19 pandemic 


The principal research tool that OED editors use to track the emergence of new 
words and senses to be considered for addition to the dictionary is Oxford Lan- 
guages” monitor corpus of English (henceforth the Oxford Monitor Corpus), which 
currently contains over 14 billion words of web-based news content from 2017 to 
the present day, and is updated each month. Once a word is identified from the cor- 
pus as a candidate for inclusion, editors carefully research both print sources and 
digital text databases to make sure that there are various independent examples of 
the word being used, for a reasonable amount of time and reasonable frequency in 
the types of text in which one would normally expect to find it (see Diamond 2015). 
There is no exact timespan and frequency threshold for inclusion, as this may vary 
depending on the type of word. Some words are added to the OED after a relatively 
short period of time because of their huge social impact, and this has never been 
truer than in the case of perhaps the most important new word to come out of the 
Covid-19 pandemic — the word Covid-19 itself. 
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2.1 The term Covid-19 


Several of the lexical innovations that emerged during the pandemic are completely 
new words, or neologisms, the most notable being the name given to the disease at the 
root of the crisis. Covid-19 first appears in a situation report published by the World 
Health Organization (WHO) on 11 February 2020 as the official name of the disease 
caused by the virus provisionally called 2019 novel coronavirus (2019-nCov) and later for- 
mally named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Covid-19 is 
shortened from coronavirus disease 2019, and follows the WHO's recently adopted best 
practices for naming new human infectious diseases (WHO 2015). Covid-19 is an accessi- 
ble term that indicates the disease's causal pathogen and year of first detection, while 
avoiding certain geographic, ethnic, cultural, or occupational references that could lead 
to stigmatizing associations with a particular place or group of people, as had happened 
in earlier pandemics (e.g., Spanish flu, gay cancer for AIDS) (Deang/Salazar 2021). 

In the months following its coinage, Covid-19 underwent an exponential rise in 
usage rarely seen by lexicographers in such a short period of time. By April 2020, when 
it was added to the OED, it was one of the top five nouns in the dictionary's monitor 
corpus data for that month (the other four were people, time, year, and coronavirus), 
and by May 2020 it had overtaken the word coronavirus in frequency (see Figure 1).! 

One of the challenges with adding extremely new words to dictionaries is that 
usage may be unfixed. Stewart (2020) explains that when Covid-19 was first added to 
the OED in April 2020, it was defined as “an acute respiratory illness”, but in another 
special dictionary update in July 2020, this definition was changed to “a disease ... 
characterized mainly by fever and cough, . . . capable of progressing to pneumonia, 
respiratory and renal failure, blood coagulation abnormalities, and death”, in order to 
reflect new information about the effects of the virus on multiple organ systems. 

Another aspect of usage which may be subject to change is spelling. There has 
been quite a lot of discussion online, especially earlier in the pandemic, about 
whether Covid-19 should be spelled with an initial capital (as in this article) or with 
full capitals, COVID-19, and different official bodies and news organizations follow 
different practices. This is the kind of information for which people often turn to a 
dictionary. Corpus frequencies help to show the most typical use, and in this case it 
has been found that there is considerable regional variation (see Figure 2): the form 
with only initial capital is more frequent in the United Kingdom, while there is a clear 
preference for the all-capital form in the United States. English speakers in Ireland, 
New Zealand, and South Africa also lean towards the initial-capital form, while those 


1 Throughout this article, charts show frequencies per million tokens. (Tokens are the smallest 
units of a corpus, typically either words or punctuation marks.) Also, variant spellings and inflected 
forms are included: for example, figures for Covid-19 include those for Covid19, COVID-19, etc. (un- 
less stated otherwise, as in Figure 2); figures for frontliner include those for frontliners, front-liner, 
and so on). 
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Figure 1: Frequency of Covid-19 and coronavirus in the Oxford Monitor Corpus, October 2019 
to October 2020. 


in Canada, Australia, and India prefer the all-capital form. There may be fluctuations 
as time goes on, and this is something the OED will continue to track. Following its 
usual style, the OED entry gives the most common British form as the headword but 
lists the other forms as variants. 
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Figure 2: Relative frequencies of Covid-19/COVID-19 in selected varieties of English in the Oxford 
Monitor Corpus, as of July 2021. “Other” includes CoVID-19, Covid19, etc. 
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2.2 Other neologisms 


In addition to Covid-19 itself, numerous other new words have entered the English 
language as a result of the pandemic. These include shortenings denoting Covid-19 
or the coronavirus, like Covid (first recorded in a tweet on the same day that Covid-19 
was coined), corona, rona (particularly frequent in colloquial US and Australian use), 
C-19, and nCoV. 

While pandemic-related words such as lockdown have been widely borrowed 
from English into other languages (see section 6), borrowing into English seems to 
have been a less common source of new vocabulary during the time of Covid-19. 
There are exceptions, such as Hamsterkauf, a German word meaning 'panic buying” 
(from the idea of a hamster - German Hamster — hoarding food in its cheeks) and 
occasionally used as an English word early in 2020: examples from the Oxford Mon- 
itor Corpus include *supermarkets are experiencing a wave of hamsterkauf" and 
*the initial hamsterkauf phase of the pandemic". However, most of the uses refer to 
the word as a loanword from German, and hamsterkauf seems not to have taken 
root as a naturalized English word. 

Much more productive methods of neology during the pandemic have been 
blending and compounding. Covid- and corona- have been particularly productive 
elements, especially in covidiot and also in more ephemeral formations like covidi- 
vorces “divorces prompted by the stress of lockdown' or, more positively, coronials 
*the generation of babies born during lockdown' (from Covid and millennials). As 
face-to-face interactions began to be prohibited and the videoconferencing software 
Zoom became ubiquitous, coinages such as Zoombombing, Zoom-ready, and zump- 
ing *dumping someone over Zoom' emerged. (The OED also added an entry for the 
use of Zoom as a verb.) There have also been various coinages formed with -demic 
(from pandemic or epidemic), such as twindemic, referring to a hypothetical pair of 
pandemics occurring at the same time, and sceptical formations like scamdemic 
and plandemic. Many other new blends and compounds - some serious, some more 
playful - have been created, including anthropause (the global slowdown of travel 
and other human activity during the pandemic), pancession (an economic recession 
caused by a pandemic), isodesk (a home workplace), maskne (acne caused by wear- 
ing a face mask), and a plethora of words denoting alcoholic drinks consumed dur- 
ing lockdown or self-isolation, such as quarantini and locktail. 

Although some of these words have experienced widespread popularity, many 
are likely to be quite short-lived, and have not yet been added to the OED, but the 
dictionary's editors will continue to track their development. Criteria for determin- 
ing whether and when to add a word to the OED include longevity, frequency of 
occurrence, and variety of sources, although there are no rigid rules and each word 
is considered on its own merits (see further Diamond 2015). 
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3 Words with new senses or new significance 


Not all of the lexical developments during the pandemic have been completely new 
words — in fact, our corpus monitoring has shown that most of them are existing 
words that have developed new senses or gained special significance as a result of 
the pandemic. 


3.1 Corpus keywords 


Table 1 shows the top ten keywords in the Oxford Monitor Corpus for the first seven 
months of 2020. Keywords are words that appear significantly more frequently in 
one part of the corpus than in the corpus as a whole (Kilgarriff 2009), so these are 
words that were particularly frequent in the given months. Keywords relating to the 
coronavirus crisis are highlighted in bold in the table, where it can be observed that 
Covid-19, Covid, and other abbreviations are the only actual neologisms. The other 
keywords are words with a longer history, like coronavirus, lockdown, pandemic, 
furlough, and covering. A list such as this is used by OED lexicographers to check 
against the dictionary's coverage so as to determine whether any new information 
needs to be added to existing entries. 


Table 1: Top 10 keywords in the Oxford Monitor Corpus, January to July 2020. 


Jan 2020 Feb 2020 Mar 2020 Apr 2020 May 2020 June 2020 July 2020 
bushfire Covid-19 Covid-19 PPE reopen defund covering 
coronavirus coronavirus pandemic lockdown lockdown Juneteenth Covid 
Iranian quarantine distancing pandemic Covid-19 brutality in-person 
SARS pandemic coronavirus ventilator pandemic anti-racism mask 
Iraqi virus self-isolate stay-at- Covid racism mask- 
home wearing 
sign-stealing outbreak lockdown Covid-19 distancing Covid pandemic 
koala caucus self- furlough hydroxy- Confederate distanced 
isolation chloroquine 
virus locust sanitiser/ distancing covering looting Covid-19 
sanitizer 
impeachment infect quarantine coronavirus furlough covering SARS- 
CoV-2 
airstrike epicentre/ ventilator N95 stay-at-home kneel pre- 


epicenter pandemic 
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The list of keywords presents a fascinating overview of changing global events 
and concerns in the first seven months of 2020. In January and February, some of the 
keywords were related to the coronavirus; others referred to different world events 
such as the Australian bushfires, the assassination of Qasem Soleimani, Donald 
Trump's impeachment, the Democratic caucuses, locust swarms in East Africa, and 
investigations into the Astros sign-stealing scandal. In March, however, every one of 
the top ten keywords was in some way related to coronavirus. This remained the case 
until June, when many of the keywords reflected the impact of the Black Lives Matter 
movement and the protests following the killing of George Floyd on 25 May 2020. 

It is also revealing to compare the pandemic-related keywords in the table. In Jan- 
uary 2020, the words mainly related to naming and describing the virus: coronavirus, 
SARS, virus. By March, April, and May the keywords reflected the social and economic 
impact of the virus: social distancing, self-isolation, quarantine, and lockdown were all 
especially frequent, as was furlough, following the introduction of the UK’s Coronavi- 
rus Job Retention Scheme in late March. Issues surrounding the medical response are 
reflected in the keywords PPE (for personal protective equipment) and ventilator. 

In May 2020, we saw the first signs of looking ahead to life post-lockdown, with 
reopen as the top keyword. This trend continued in July, when there was an inter- 
esting pattern of contrast with virtual life as people started thinking about or tenta- 
tively restarted face-to-face interaction: in-person increased in frequency, used in 
contexts which previously would not normally have been necessary (since the “in- 
person” version was the norm), as in in-person worship and in-person graduation. 

In July 2020, the top keyword was covering, overwhelmingly in face covering or 
in other uses referring to face masks (including facial covering or simply covering in 
this sense, in examples such as “shop staff do not have to wear coverings”). Mask 
and mask-wearing also appeared as keywords in July, reflecting ongoing discus- 
sions about when and where masks and coverings should be worn.? 

Analysis of these keywords prompted a number of additions and updates to the 
OED. For example, new entries for self-isolation (and related terms), face covering, 
and PPE were added (although these terms are not completely new - see section 3.3). 
Some entries were revised to account for shifts in use or emphasis: for example, the 
relevant sense of lockdown was updated to include the public health measure aspect, 
while the entry for furlough was fully revised, including a new comment about the 
spread of the sense “dismissal or suspension from employment” from the US to the 
UK and other countries. 


2 The foregoing discussion draws on the analysis in Wild (2020a) and Wild (2020b). Table 1 shows 
keywords only up to July 2020: after this month, the corpus keyword lists were less dominated by 
Covid-19-related topics, and reflected other events such as the US presidential elections. However, 
Covid-19 certainly continued to be a theme, with keywords including pre-Covid, and, from the end 
of 2020 and beginning of 2021, vaccine and related words. 
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3.2 The term coronavirus 


As noted above, many of the words used in the context of the pandemic are not 
completely new but were relatively uncommon before 2020. This is the case with 
the word coronavirus itself, the name of the group of enveloped, single-stranded 
RNA viruses of which the causal pathogen of Covid-19 is a member. The word coro- 
navirus is first recorded in the OED in 1968, in an article in Nature, but before the 
Covid-19 pandemic its use was mainly confined to medical and scientific specialists. 
This is reflected in corpus data: as shown in Figure 1 (section 2.1), coronavirus was 
relatively rare in general news media before 2020; by March 2020, it was dominat- 
ing the global conversation. 

One way of illustrating the extent to which the word coronavirus became over- 
whelmingly frequent at the beginning of the pandemic is to compare its frequency 
with that of other significant words at the time. Figure 3 compares the frequency of 
coronavirus with that of words referring to other major news topics in 2019 and 2020 — 
climate, Brexit, and impeachment — and shows that coronavirus was over ten times as 
frequent as any of these words at its peak. Figure 4 shows that, by March 2020, coro- 
navirus was as frequent as one of the most commonly used nouns in the English 
language, time. 
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Figure 3: Frequency of coronavirus, climate, Brexit, and impeachment in the Oxford Monitor 
Corpus, December 2019 to March 2020. 


As with Covid-19, changes have been made to the OED’s entry for coronavirus in 
light of developments during the pandemic (Stewart 2020). A second sense has 
been added to refer specifically to those coronaviruses that cause life-threatening 
diseases in humans, including SARS (Severe Acute Respiratory Syndrome), MERS 
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Figure 4: Frequency of coronavirus and time in the Oxford Monitor Corpus, January 2020 
to March 2020. 


(Middle East Respiratory Syndrome), and Covid-19. Additionally, since the name of 
a disease often also ends up being applied to the pathogen causing it, and vice 
versa, both Covid-19 and coronavirus began to be used interchangeably for the dis- 
ease and the virus; again, this has been reflected in the updated OED entries. 


3.3 Older words newly added to the OED 


Social distancing was one of the entries added to the OED in its first special pandemic 
update in April 2020. As can be seen from Figure 5, this term saw an enormous in- 
crease in usage: its frequency was negligible before 2020; then by April 2020 it was 
occurring over 250 times for every million tokens in the Oxford Monitor Corpus, 
which is roughly the same frequency as that of the word food. 

However, when OED lexicographers researched this word, consulting databases 
of books, newspapers, journals, and other types of written sources, they found that 
social distancing is far from being a new term. It dates back to 1957, originally signify- 
ing an aloofness or deliberate attempt to distance oneself from others socially. It is 
only decades later when it acquired the now more familiar sense of limiting physical 
contact in order to avoid infection, but even this sense goes back almost two decades, 
to 2004. This antedating of what may originally seem to be obvious neologisms is 
something that often occurs when a lexical item is researched for the OED, and there 
are several examples of such terms from the Covid-19 pandemic: self-quarantine dates 
back to 1876 as a noun and 1918 as a verb, elbow bump to 1902, and contact tracing to 
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Figure 5: Frequency of social distancing in the Oxford Monitor Corpus, October 2019 
to October 2020. 


1910; face covering is first recorded in 1732, and first used in a medical context in 1946. 
Although these terms were new to the OED when they were added as a result of the 
pandemic, they are not completely new to the language. 

Some expressions were coined during previous public health crises or for other 
kinds of emergencies but have been revived during the time of Covid-19. Info- 
demic, a blend of information and epidemic, was coined in 2003 during the SARS 
epidemic to refer to the outpouring of often unsubstantiated information relating to 
a crisis, and was then also widely used to describe the proliferation of news around 
coronavirus. The phrase shelter-in-place, a protocol instructing people to find a 
place of safety in the location they are occupying until the all clear is sounded, was 
devised as an instruction for the public in 1976 in the event of a nuclear or terrorist 
attack but was then adapted as advice to people to stay indoors to protect them- 
selves and others from disease (Paton 2020). 


3.4 New senses or nuances of existing words 


Collocational information gleaned from corpus data helps OED lexicographers under- 
stand the contexts surrounding the usage of a word and discover particular nuances 
or senses. For example, in the dictionary’s entry for frontline, the sense of the adjec- 
tive as used in frontline worker/employee/staff, etc., had been defined as “of a person: 
working at the forefront of an organization's public activity, typically as the point of 
direct contact with customers, clients, users of the organization’s services, etc.’ This 
was an accurate summary when the entry was first revised a few years ago, but the 


20 —— Danica Salazar, Kate Wild 


focus of the sense has shifted during the Covid-19 pandemic. OED editors compared 
salient collocates of frontline in 2020 with those of previous years and found that al- 
though some had remained unchanged — frontline staff is one consistently common 
collocation — others, such as the following, stood out as much more frequent in 2020: 
frontline nurse/medic/caregiver; frontline healthcare/health-care workers; frontline 
warrior/hero; courageous/heroic frontline workers; essential frontline worker. This 
very positive sentiment associated with frontline workers, and the focus on such 
workers as carrying out essential roles, especially in health care, led to the OED defi- 
nition being expanded as follows: “of a person: working at the forefront of an organ- 
ization's public activity, typically as the point of direct contact with customers, 
clients, users of the organization's services, etc., (now) esp. designating such an em- 
ployee who provides a service regarded as vital within the community, such as a 
health-care worker, teacher, etc.; often in frontline worker. 

Another new pandemic-related sense added to an existing OED entry is bubble. 
Bubble is a word with a long history, its literal sense dating to the Middle Ages and 
various figurative senses (mainly relating to either impermanence or protection) to 
the Early Modern period. In 2021 a new sense was added, ‘a group consisting of a 
restricted number of people who have a close relationship or regular social contact; 
(later) spec. such a group whose members are, under public health measures, per- 
mitted to be in close physical proximity”. The first, general strand of this sense 
dates back to 2000, but the OED definition notes that the later specific strand “arose 
in 2020 as part of the official recommendations of some governments in response to 
the Covid-19 pandemic’. Again, the emerging new sense is discernible from corpus 
data: a comparison of collocates in 2020/2021 and previous years highlights new or 
newly significant collocations such as household bubble and support bubble. 


4 The spread of scientific terminology 
in general discourse 


Another notable feature of the language of the pandemic has been the way that it 
has introduced scientific terms into general discourse. As both scientists and the 
public endeavoured to increase their understanding of the coronavirus and its ef- 
fects, specialist scientific and medical language became increasingly prominent. 
This development has already been discussed with reference to the term coronavi- 
rus (see section 3.2), and it is reflected in numerous other words in various fields. 
Lexical items from the field of epidemiology that were previously known mainly 
to the scientific and medical community were suddenly being heard in the news 
and in everyday conversation. For example, reproduction number, reproductive 
number, R number, or simply R, became widely used as people became preoccupied 
with “getting the R down”. This crossing over from specialist to general vocabulary 
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is reflected in the set of quotations in the OED entry for this sense of R: the earliest 
use is from the proceedings of an epidemiology conference published in 1975, while 
the most recent example is from a news article. 

Other epidemiological terms such as community transmission, community spread, 
case fatality rate, and flattening the curve became widespread enough to merit inclu- 
sion in the OED’s first special pandemic update published in April 2020. In its second 
special update three months later, the dictionary focused even more on scientific and 
medical terminology, adding such terms as cytokine storm “an overactive immune re- 
sponse occurring in various infectious and non-infectious diseases, characterized by 
the excessive production of cytokines and resulting in intense localized or general- 
ized inflammation”; spike protein “a glycoprotein projecting from the envelope which 
binds to a receptor on the host cell and facilitates entry of the viral genome into the 
host cell”, and CPAP “continuous positive airway pressure, a method of respiratory 
therapy in which air at a pressure higher than atmospheric pressure is pumped into 
the lungs through the nose or nose and mouth during spontaneous breathing”, a less 
invasive treatment for Covid-19 patients than one involving a ventilator (Stewart 
2020). Again, these terms are not completely new, but they have become widely fa- 
miliar to non-specialists as a result of the pandemic. 


5 Regional variation 


We have discussed the use of corpus data to identify new words, spikes in frequency, 
and shifts in collocation and other aspects of usage. Corpora also provide useful in- 
formation about the distribution of a word in different varieties of English, which is 
reflected in the labelling and metadata of new or revised dictionary entries. 

In the case of self-isolate, self-quarantine, and related words, OED editors work- 
ing on these terms felt that although there are technical differences between them, 
they are often used interchangeably, the main difference being in regional distribu- 
tion. To confirm this, they looked at various corpora. The clearest picture can be seen 
in the Coronavirus Corpus, a corpus of news articles relating to Covid-19 (Davies 
2019-), which shows that self-quarantine is more common in the United States than 
in Canada, Great Britain, Ireland, Australia, and New Zealand, where self-isolate 
and self-isolation are preferred (see Wild 2020b). A note to this effect has been added 
to the OED’s updated entry for self-quarantine, v.: In recent use, in the context of the 
Covid-19 pandemic, self-isolate and self-quarantine have often been used interchange- 
ably, with self-quarantine being more common in the United States”. 

Corpus frequency data also enabled OED editors to analyse the regional distribu- 
tion of the word frontliner. They discovered that although it is used worldwide, it is 
particularly frequent in Southeast Asia, especially in the Philippines and Malaysia 
(see Figure 6); in other countries the more usual term is frontline worker or similar. 
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Figure 6: Frequency of frontliner in selected varieties of English in the Oxford Monitor Corpus, 
July 2020. 


For this reason, the OED entry for the relevant sense of frontliner is labelled “now 
chiefly South-East Asian”. 

Also showing some interesting geographic variation are the names for the set of 
measures that many countries have taken to contain the spread of the virus by se- 
verely limiting the movement of people outside the home. Lockdown is the word 
with the most widespread use and is the preferred term in countries such as the 
United Kingdom, Canada, and Australia. In the United States the coronavirus re- 
strictions are called shelter-in-place. The word iso, short for isolation, is also used 
colloquially, especially in Australia and the United States. In Malaysia, the initial- 
ism MCO is used, short for movement control order, while in the Philippines, ECQ is 
preferred, short for enhanced community quarantine — both phrases are the official 
government designations for these countries” stay-at-home regulations. In Singa- 
pore, there was a remarkable spike in usage of the term circuit breaker in April 2020 
when it was adopted by the Singaporean government as the name for its strict quar- 
antine measures (see Figure 7). Known to most people as a safety device that stops 
the flow of current in an electric circuit, circuit breaker is also familiar to those in 
finance as a regulatory instrument designed to prevent panic selling by temporarily 
stopping trading on an exchange. While it makes sense for a global business hub 
such as Singapore to have adapted a piece of finance slang in such a way, it is note- 
worthy that later in 2020, in September and October, circuit breaker also became a 
much-used term in British English, describing a short, fixed-term set of restrictions 
which scientists recommended the government should implement in order to stem 
another incoming tide of coronavirus infections (see Figure 8). 
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Figure 8: Frequency of circuit breaker in the UK and US components of the Oxford Monitor Corpus, 
March to October 2020. 


Local responses to the coronavirus pandemic have also resulted in several neolo- 
gisms in different varieties of English. In the Philippines, Filipinos from other regions 
stranded in a locked-down Manila are referred to as LSIs, short for locally stranded 
individuals; in Singapore, a person who needs to self-isolate is issued an SHN or stay- 
home notice; while in India those who wish to cross internal borders need to have an 
e-pass, an official government document authorizing a person’s movement during 
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quarantine. Australians try to keep themselves safe from the virus through the regu- 
lar use of sanny (hand sanitizer), while West Africans wash their hands using the Ve- 
ronica bucket — a type of sanitation equipment composed of a covered bucket with a 
tap fixed at the bottom and a bowl fitted below it to collect wastewater, named after 
its Ghanaian inventor, Veronica Bekoe (Salazar 2020). 

A Southeast Asian term added to the OED in 2016 suddenly gained global noto- 
riety at the outset of the Covid-19 pandemic: wet market. This term, first attested in 
1978, was originally used only in Southeast Asian countries to refer to a market for 
the sale of fresh meat, fish, and produce, an essential part of the region's food sup- 
ply chain. However, the identification of a Wuhan market as ground zero for the 
coronavirus outbreak led people outside of Southeast Asia to incorrectly conflate 
wet markets with illegal wildlife markets, subjecting wet markets to much public 
criticism (Lim 2020) and causing a considerable increase in the usage of the term in 
the early months of 2020 (see Figure 9). 
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Figure 9: Frequency of wet market in the Oxford Monitor Corpus, October 2019 to October 2020. 


6 Languages other than English 


The lexical monitoring carried out by Oxford Languages” lexicographers has informed 
the coverage of the pandemic lexicon not only in the OED, but also in other Oxford 
dictionaries of current English and even in Oxford dictionaries of other languages. 
Key terms related to Covid-19, such as the neologisms and newly prominent words 
mentioned throughout this article, were translated into 19 different languages by Ox- 
ford University Press editors and translators in Oxford and in its international offices 
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in China, India, East Africa, and South Africa so that new words and senses could be 
incorporated into its monolingual and bilingual dictionaries of these languages. 
These translations into Afrikaans, Arabic, Catalan, Chinese, Dutch, Filipino (Taga- 
log), French, German, Hindi, Italian, Northern Sotho, Portuguese, Setswana, Spanish, 
Swahili, Tamil, Telugu, Xhosa, and Zulu have also been made freely available as 
downloadable resources online.? 

The translation of key coronavirus words into such diverse languages has also 
provided important insights into the impact of the pandemic on these languages. 
There are some commonalities with English: the emergence of new words and 
senses, the increased significance of medical and scientific terminology, and the 
prominence of expressions referring to government and individual actions aimed at 
containing the spread of the virus and mitigating its social and economic effects. 
There are also interesting differences. For example, the English word lockdown has 
been borrowed by several languages including Dutch, Filipino, German, Italian, 
and Telugu, while other languages prefer their equivalent forms for confinement, 
for example, confinament for Catalan, confinement for French, confinamento for Por- 
tuguese, and confinamiento for Spanish. Some languages use corresponding expres- 
sions conveying closure, for instance, 3?! “iighlag for Arabic, #8% fengsuó and 3114] 
fengbi for Chinese and ukuvalwa thaqa kwezwe for Zulu. 

The Covid-19 translation project has also highlighted the influence of English, 
the principal language of global scientific communication, on the Covid-19 vocabu- 
lary of these languages. This influence can be seen in some notable lexical innova- 
tions. In Italian, for instance, the word droplet has come to refer not only to the 
very small airborne drops of secretions from the nose, throat, or lungs by which the 
coronavirus can be transmitted, but also to the distance one person must maintain 
from another to prevent such a transmission from happening. 


7 Conclusion 


The OED’s efforts to document the lexical change brought to the English language 
by the coronavirus pandemic continued throughout 2020, culminating with the 
Words of an Unprecedented Year report, which was published at the end of the year 
in place of the usual selection of a single Word of the Year (Oxford Languages 
2020). This expansive report on the words that defined 2020 features an entire sec- 
tion dedicated to the language of Covid-19. 

However, the work did not end there. Further pandemic-related additions and 
revisions to the dictionary have been included in the OED's regular quarterly 


3 The translations can be downloaded from Oxford Languages' Covid-19 Language Hub: https:// 
languages.oup.com/covid-19-language-resources/#translations (last access: 12 August 2021). 
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updates in 2021, with face shield, essential worker, mask up, and the aforemen- 
tioned bubble being notable examples. Several more are scheduled to be published 
in upcoming updates. OED lexicographers will continue to monitor their in-house 
corpora and other language data to identify and document new words and senses 
associated with the pandemic that have had such an impact on our language and 
our lives. 
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Annette Klosa-Kiickelhaus 
German Corona-related neologisms 
and their lexicographic representation 


1 Introduction 


Between January 2020 and July 2021, many new words and phrases contributed to 
the expansion of the German vocabulary to enable communication under the new 
conditions that evolved during the Covid-19 pandemic. Medical and epidemiologi- 
cal vocabulary was integrated into the general language to a large extent. Sud- 
denly, some lexemes from general language were used with very high frequency, 
while other words were used less often than before. These processes of language 
change can be studied in various ways, for example, in corpus linguistics with re- 
spect to the frequency or emergence of certain words in certain types of texts (e.g. 
press releases vs. posts in social media), in critical discourse analysis with respect 
to certain participants of the discourse (e.g. vocabulary of Covid-19 pandemic deni- 
ers),! or in conversation analysis (e.g. with respect to new verbal interactions in 
greetings and farewells). The rapid expansion of vocabulary has notably affected 
also lexicography as a discipline of applied linguistics. 

General language dictionaries or terminological dictionaries have quickly re- 
flected on how the German lexicon evolved during the Covid-19 pandemic. For exam- 
ple, new entries have been added to Digitales Wórterbuch der Deutschen Sprache, a 
comprehensive synchronic general language dictionary of German, such as Corona- 
party? “a privately organized party held during the corona crisis, bypassing the coro- 
navirus containment measures”. New senses were also recorded, such as the “highest 
school-leaving certificate based exclusively on school achievements already achieved 
[. . .] without taking final examinations’ attributed to the noun Durchschnittsabitur 
which has the older meaning “university entrance qualification with a mediocre, aver- 
age final grade point average”. Duden publishing house with its extensive monolin- 
gual online dictionary on contemporary German, Duden online, added new entries 
like Covid-19? focusing on spelling variants (Covid-19 in general and mostly COVID-19 
in technical language) and grammatical information (e.g. in the entry Coronavirus:* 


1 See Wengeler/Roth (2020) for several studies on the corona discourse and also Weinert (2021). 
2 https://www.dwds.de/wb/Coronaparty (last access: 10 June 2022). 

3 https://www.duden.de/suchen/dudenonline/Covid-19 (last access: 10 June 2022). 

4 https: //www.duden.de/suchen/dudenonline/Coronavirus (last access: 10 June 2022). 
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in terminology mostly neutral gender: das Coronavirus, in everyday conversation out- 
side of terminology also masculine gender: der Coronavirus). Regarding technical lexi- 
cal items, for example, the German Bundessprachenamt (‘Federal Language Bureau’) 
published an online dictionary, Coronavirus Terminology in German, English, French, 
Dutch, Polish, Russian, Spanish, comprising 18,490 terms (e.g. akute respiratorische 
Erkrankung “acute respiratory infection”, Bewegungsprofil “movement profile”). 

The following sections, however, will focus on the ways in which a German neol- 
ogism dictionary project has chosen to capture and document lexicographic informa- 
tion in a timely manner. Both challenges and advantages arise from lexicographic 
practice “at the pulse of time”. The Neologismenwörterbuch is presented as an exam- 
ple that lends itself well to such a discussion because its subject (neologisms) is char- 
acterized as new, innovative, and constantly changing. 


2 German neologisms and neologism dictionaries 


New words emerge in the German language all the time, but not all of them are neo- 
logisms. Being a morphologically productive language, many new compounds or deri- 
vations are used only once; and such nonce words are not lexicalized. Neologisms, on 
the other hand, are defined here as lexical units or senses/meanings that emerge in a 
language community over a specific period of time of language development, which 

(a) diffuse, (b) are generally accepted as language norms, and (c) are perceived by the 

majority of speakers as new for some time (cf. Herberg et al. 2004: XII). There are 

some indicators for the acceptance of a neologism in German: its increasing over- 
all frequency, its distribution in many different text types/genres, and its use in 
many different discourses. Other indicators tell us how far the process of lexicali- 

zation of new words has developed (cf. Lemnitzer 2010: 69): 

— Pragmatic criteria: A neologism is no longer written in quotation marks, hedge 
words (e.g. sogenannt ‘so-called’) are not used any more, and distancing phrases 
(e.g. wie man heute sagt ‘as we say today’) are abandoned. 

— Grammatical criteria: The gender of nouns is invariable; a full conjugational 
paradigm for verbs has developed. 

—  Word-formation criteria: A neologism is used as first and second component in 
an increasing number of compound nouns; a borrowed neologism is combined 
with indigenous lexemes in word-formation products. 


According to the definition given above, some time (possibly a couple of years) must 
pass before a new word can be classified as a neologism, and it will only continue to 
be part of this word class for some time (possibly a couple of years) before it is no 
longer perceived as new. This notion directly impacts the lexicographic description of 
these lexemes in neologism dictionaries, which are defined as specialized reference 
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guides focusing on the description of meaning and usage of such lexemes in a spe- 
cific language which became part of the vocabulary of that language at a certain time 
(cf. Barnhart/Barnhart 1990; Lemnitzer 2010; Wiegand 1990 for more details). There 
are some (neologism) dictionaries that only record neologisms retrospectively, that 
is after their lexicalization. The Neologismenwörterbuch is a typical, corpus-based ex- 
ample of this type of dictionary, Quasthoff (2007) is another. Other neologism dictio- 
naries, like Die Wortwarte, record neologisms at their “moment of birth” (cf. Lemnitzer 
2010: 67) before they are fully lexicalized, but are nevertheless accepted parts of the 
lexicon. 

Regarding the remarkably quick extension of the German lexicon during the 
Covid-19 pandemic, that is, in a relatively short time span of less than two years, 
the second type of neologism dictionaries can continue expanding its list of entries 
according to established criteria of lemma inclusion. The other, retrospective type of 
neologism dictionary, however, cannot react directly to the expansion of the lexicon, 
as lexicographers do not know yet whether the new lexemes, phrases or senses/ 
meanings will become generally established parts of the lexicon after some time. 


3 User needs and lexicographic responses 


Many people around the world noticed new words emerging in their languages in the 
context of the Covid-19 pandemic, including internationalisms like corona, Covid-19, 
social distancing as well as other language-specific ones. Journalists, teachers, staff 
of medical or political institutions, etc., reacted to the general interest in these new 
lexical items and soon started to publish glossaries with definitions and/or some en- 
cyclopedic information. As early as March 2020,” several daily newspapers (e.g. Süd- 
deutsche Zeitung, Die Rheinpfalz), news magazines (e.g. Der Spiegel), radio stations 
(Bayerischer Rundfunk, Deutschlandfunk), and news programmes (Tagesschau) in 
Germany offered corona glossaries with information on neologisms, medical ter- 
minology, etc., to their audience (cf. Möhrs 2020). Many of these are still available 
online (June 2022), and some have been updated since then. In addition, several sci- 
entific organizations and other public institutions began to publish (online) termino- 
logical resources (e.g. Corona-Glossar by the Helmholtz Association of Research 
Centres, or the above-mentioned multilingual terminological Corona database by 


5 In Germany, the first attested infection with the SARS-CoV-2 virus was registered on 27 Janu- 
ary 2020, and by the middle of March 2020, infections were registered in all federal states. General 
information on the Covid-19 pandemic in Germany can be found in Bundeszentrale fiir politische 
Bildung (2021). 
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Bundessprachenamt). Finally, some individuals utilized social media to call for col- 
laboration regarding the collection of corona words.é 

The first German dictionary project which quickly reacted to the challenge of of- 
fering information on those words from the medical and epidemiological contexts that 
suddenly were of general interest was Digitales Wórterbuch der Deutschen Sprache 
(DWDS). By the middle of March 2020, the DWDS-Themenglossar zur COVID-19- 
Pandemie (‘Thematic glossary on the covid-19 pandemic’) was published,’ and it 
has since been updated continuously.? Duden publishing house updated their on- 
line dictionary on contemporary German extensively in August 2020, and then 
more Corona neologisms were added regularly.? 

In the following sections, more information is given on how the Neologismen- 
wörterbuch (‘Neologisms Dictionary”) project at IDS Mannheim reacted to the un- 
precedented expansion of the German lexicon with new lexemes, phrases and 
meanings related to the pandemic. 


4 The Neologismenwörterbuch 
4.1 General remarks 


The Neologismenwórterbuch covers new words and new senses or meanings estab- 
lished since 1991. New entries are compiled continuously, and the reference guide 
is published online as part of the dictionary portal OWID (Online-Wortschatz- 
Informationssystem Deutsch, “Online information system on the German lexicon”) 
of the Leibniz Institute for the German Language at Mannheim.!º In this dictionary 
project, the editorial interpretation (evaluation of print and online media) is com- 
bined with a quantitative corpus-linguistic method” to extract candidates for 


6 See, for example, the threads of tweets by Nadja Hahn, https://twitter.com/nadjasnews/status/ 
1334517401359015936 (December 2020), or Lara Fritzsche, https://twitter.com/larafritzsche/status/ 
1304330059935895552 (September 2020) (last access: 10 June 2022). 

7 https: //www.dwds.de/themenglossar/Corona (last access: 10 June 2022). 

8 At the same time, DWDS had included new corona-related entries to its comprehensive general 
language dictionary of contemporary German and had updated several entries regarding the corona 
pandemic. For examples, see above. 

9 According to a private communication by Kathrin Kunkel-Razum, editorial director of Duden dic- 
tionaries, in October 2021. 

10 For the decades of 1991-2000 and 2001-2010, print dictionaries are also available (Herberg/ 
Kinne/Steffens 2004; Steffens/al-Wadi 2015). The lexicographic concept for these volumes goes 
back to the late 1980s (cf. Heller/Herberg/Lange/Schnerrer/Steffens 1988; Kinne 1989) and 1990s 
(cf. Herberg 1997 and 1998). 

11 For further information on the editorial and corpus-linguistic methods applied to detect poten- 
tial neologisms for the Neologismenwörterbuch cf. Klosa/Liingen (2018). 
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inclusion into the dictionary (semi-)automatically. The Neologismenwörterbuch 

contains entries on: 

- single words, e.g., Spam (‘spam’), 

- multi-word expressions, e.g., Generation Facebook (‘age group of young people 
for whom the Internet and, in particular, communication via social networks 
are a matter of course”), 

- new word formation elements e.g., f. . .Jholic (used to form nouns denoting 
someone who is very intensely involved with something, resembling an addic- 
tion to something). 

- new senses of existing words in German, e.g., texten (‘to text’). 


Since its first online publication in 2005 and for the next 15 years, new and compre- 
hensive entries were regularly uploaded to the dictionary by the end of each year. 
Full entries in the Neologismenwórterbuch comprise information on etymology, or- 
thography, pronunciation, meaning, usage, grammar, word formation, encyclopedic 
information, illustrations, as well as frequency and emergence in the corpus. In 2020, 
a new type of a more concise entry structure was established for all the words that do 
not need more extensive information because, for example, no pragmatic restrictions 
apply to their use. As such, they offer details on grammar, orthography, meaning, 
word formation or etymology as well as the decade of emergence. These entries are 
uploaded in thematically related groups on a monthly basis (e.g. April 2021: agricul- 
ture, July 2021: crime, September 2021: fashion).*? The latest inclusions, covering 
terms used in agriculture, crime or fashion, illustrate how new words often center 
around a specific new subject. By providing such thematically related bundles of new 
items users get a compact overview of recent lexical-semantic developments within a 
specific area. 

The Neologismenwórterbuch offers different lists of lemma groups”? as well as 
a thematic search!” and other extended search options” for both lemma types. 
For example, for any neologism dating back to a specific decade (1991-2000, 
2001-2010, or 2011-2020), all new phrases, every new word formation element, or 
all neologisms assigned to the domain of sports, medicine, or fashion, etc., can 
easily be found. 


12 For an overview see https: //www.owid.de/docs/neo/listen/kurzartikel.jsp (last access: 10 June 2022). 
13 See https: //www.owid.de/docs/neo/wortartikel.jsp for different lemma groups (last access: 
10 June 2022). 
14 See https: //www.owid.de/docs/neo/gruppen.jsp?grp=1 for thematic search option (last access: 
10 June 2022). 
15 See https://www.owid.de/docs/neo/suche/index.jsp for extended search options (last access: 
10 June 2022). 
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While dictionary users thus already had many options to access extensive infor- 
mation on neologisms that are accepted parts of the German lexicon in Neologis- 
menwörterbuch, the dictionary — according to its definition of a neologism (see 
above) — however, did not offer any information regarding new words that are not 
fully lexicalized yet. Therefore, users of the dictionary were often not able to obtain 
details on words that were particularly conspicuous at a particular time in a specific 
discourse, thus raising questions concerning their meaning, correct spelling, etc. 
This, however, did not imply that the lexicographers had not already collected 
these words with some preliminary information in a list of candidates for inclusion in 
an internal database. Consequently, the Neologismenwörterbuch project started to 
publish a list of monitored words (Wörter unter Beobachtung ‘Monitored Words’) 
in March 2020. All entries in this list consist of lexical units that emerged since 2011, 
for which only time will tell whether they will diffuse and manifest as language 
norms. For each of these words, only a (preliminary, rough) explanation of meaning 
is given, and usage is illustrated by one or two corpus citations, wherever relevant, 
there are hyperlinks to more encyclopedic information (e.g. in Wikipedia) and the 
date of recording the word is noted. When items from this inventory are described in 
either comprehensive or concise entries, they are removed from the list, which is up- 
dated quarterly. 

By the middle of April 2020, the first corona-related neologisms were included 
in this online collection of monitored words to react quickly to the sudden demand 
for information on meaning and usage of new words in the context of the pan- 
demic. Around 30 entries on words such as Contacttracing (‘contact tracing’), Coro- 
nababy (‘child conceived during exit and contact restrictions in Covid-19 pandemic 
(in home quarantine)’ and ‘child of a Covid-19 patient’), or zoomen (‘to communi- 
cate and work with Zoom® video conferencing software’) were published. Soon it 
became obvious that the number of Corona-related neologisms would exceed the 
number of other monitored words by far, so a separate list on the Corona-related 
vocabulary" was published at the end of April 2020 (with a little over 60 entries). 
Since then, this list (see Figure 1 with the entry on No-Covid-Strategie, ‘no Covid 
strategy”, i.e. ‘concept aimed at slowing down or containing the Covid-19 pandemic 
as completely as possible through appropriate measures (e.g. pushing the infection 
figures to zero and thus creating virus-free zones)’) was updated every fortnight 
between April and June 2020 and from then on, on a monthly basis. In October 2021, 
the list included more than 1,800 Corona-related neologisms, and still, more than 
700 candidates in an internal database awaited lexicographic description and inclu- 
sion into the online index. 


16 See https://www.owid.de/docs/neo/listen/monitor.jsp (last access: 10 June 2022). 
17 See https://www.owid.de/docs/neo/listen/corona.jsp# (last access: 10 June 2022). 
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OWID 


Maskenregel MNB Mutationsgebiet 
Maskentragedisziplin MNS Mutationslage 
Maskentrottel Mobilarbeit Mutationspuffer 
Maskenvermeidungsesser mobiles Arbeiten muten 
Maskenverstoß Mobilitätsbeschränkung mütend 
Maskenverweigerer Modellregion 

N 

Nach-Corona-Welt Netflixparty No-Covid-Plan 
Nach-Corona-Zeit Netzstreik No-Covid-Strategie 
Nacktnase ne 

Nahweh Nel 

nasaler Schnelltest Ne Konzept, mit dem die Coronapandemie durch 
Nasenbohrertest Ne geeignete Maßnahmen (z.B. Drücken der 
Nasenpimmel Ne Infektionszahlen auf Null und damit Schaffung 


: | virusfreier Zonen) móglichst vollstándig gebremst bzw. 
Nasenpimmler Ne A 5 
eingedámmt werden soll 
nationale Gesundheitsreserve Nic 


nationale Reserve Nic Wissenschaftler prásentieren "No-Covid- 
| = ` 

Gesundheitsschutz Nie Strategie" [Überschrift] Die Infektionszahlen auf 

. r | Null drúcken, Grúne Zonen einrichten und neue 
nationale Teststrategie Nie a EA 

. E Fälle konsequent eindämmen - mit dieser 

negativgetestet Nie Strategie wollen 13 renommierte Wissenschaftler 
Negativgetesteter NM die Corona-Pandemie bezwingen. (www.br.de; 
Negativtest Nd datiert vom 19.01.2021) 
Negativtestung No Erfasst: Januar 2021 
Nerzvirus | 
o 
Öffnungsdebatte Öffnungswunsch Onlineproctoring 
Öffnungsdiskussion OIDA-Regel Onlinesemester 
Öffnungsdiskussionsorgie Online-Afterworkparty Onlinestunde 
Öffnungshektik Onlinebühne Onlinetermin 
Öffnungskonzept Onlinedemo Onlineunterricht 


Figure 1: Extract from the dictionary index “Neuer Wortschatz rund um die Coronapandemie” (‘New 
vocabulary around the Covid-19 pandemic’) in the Neologismenwörterbuch. 


Generally, (semi-)automatic retrieval of neologism candidates (see above) in the 
Deutsches Referenzkorpus — DeReKo (‘German Reference Corpus”) of IDS? was used 
to detect candidates for the list of new vocabulary around the Covid-19 pandemic. In 


18 See https://www.ids-mannheim.de/digspra/kl/projekte/korpora/ (last access: 10 June 2022). 
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addition, the IDS corpus tool cOWIDplus Viewer!” and its newer version OWID- 
plusLIVE?º were deployed, giving access to RSS-feeds of thirteen German online 
newspapers and magazines in weekly updates. The editorial collection of candidates 
through press reading and, for example, browsing the daily Twitter trends, was con- 
tinued as well and several other glossaries and lists of Corona-related words (see 
above) were evaluated systematically. Finally, many dictionary users participated by 
sending their suggestions for new words to be included via an online form in the 
Neologismenwórterbuch.?! 

Dictionary users were informed on the expansion of the list via the IDS newsletter 
IDS aktuell? (published quarterly). Progress on the compilation of the index is also 
recognizable by the date of recording given at the bottom of each entry (cf. Figure 1: 
Erfasst: Januar 2021, ‘Added: January 2021”. Furthermore, users learn about the sta- 
tus of the list as work-in-progress by a very short introductory text on the website. 
For the first time in its history, the team working on Neologismenwórterbuch also 
started to publish a number of shorter lexicological studies on the IDS web page” 
concerning words and specific word groups with special significance within the co- 
rona discourse (e.g. Social Distancing, Maske 'face mask', Lockdown and Shutdown), 
in which readers are referred to the index of Corona-related vocabulary where appro- 
priate. In the following section, some examples from this list are discussed to illus- 
trate the lexicographic challenges from expanding the IDS neologism dictionary 
*at the pulse of time". 


4.2 News words 


In Germany, as soon as the first vaccine for SARS-CoV-2 was found and officially ap- 
proved, the vaccination campaign started (in January 2021), with specific groups of 
people being prioritized over others for medical reasons. Some people were dissatis- 
fied with this solution and experienced a feeling of resentment, called Impfneid (‘vac- 
cine envy"), which was recorded in Neologismenwörterbuch as a typically coined 
neologism following the German compounding rules (verb impf[en] ‘to vaccinate’ + 
noun [der] Neid ‘envy’) in January 2021.” This feeling then led some people to try to 


19 See https://www.owid.de/plus/cowidplusviewer2020 (last access: 10 June 2022)/. 

20 See https://www.owid.de/plus/live-2021/ (last access: 10 June 2022). 

21 See https://www.owid.de/wb/neo/mail.html (last access: 10 June 2022). 

22 See https://www.ids-mannheim.de/aktuell/presse/newsletter/ (last access: 10 June 2022). 
23 See https://www.ids-mannheim.de/sprache-in-der-coronakrise/ (last access: 10 June 2022). 
24 See https://www.owid.de/docs/neo/listen/corona.jsp#impfneid (last access: 10 June 2022). 
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be vaccinated before their turn. These people are referred to as Impfdrängler” or 
Impfvordriingler* (‘vaccination tailgaters”), for which, yet again, typical new com- 
pounds were recorded in the dictionary in January 2021. Still, as Figure 2 illus- 
trates, both words were in use only a short period of time (January to May/ 
June 2021), as by July 2021 enough vaccines were available nationwide and priori- 
tizing immunization was no longer necessary. Thus, both words seem to be short- 
lived lexemes which may probably not meet the criteria for neologisms to be in- 
cluded in Neologismenwörterbuch in the end (see above), but which are (a) justifi- 
ably part of a list of words still monitored, and (b) might well be candidates for a 
specialized dictionary on the corona discourse, as the topic of vaccination is lexi- 
cally very rich in German and characterized by quite different discourse partici- 
pants (e.g. opponents and proponents of vaccination, experts and laypeople). 


-© (N = 1): [1. Word-Form] = impfdrángler [2/2] -O- (N = 1): [1. Word-Form] = impfneid [1 / 1] 


5 
2020-12-18 2021-01-09 2021-01-31 2021-02-22 2021-03-16 2021-04-07 


L4 


2021-04-29 2021-05-21 2021-06-12 2021-07-05 2021-07-27 


Figure 2: Relative frequency of Impfdrängler (‘vaccination tailgater) and Impfneid (‘vaccination 
envy") in RSS-feeds of 13 German online newspapers and magazines between December 2020 
and July 2021.” 


4.3 New senses 


New senses for long-existing lexemes can be attributed to the corona discourse less 
often than new words and phrases. One example is the German abbreviation 3G, 
originally denoting the third generation of mobile telecommunication networks 


25 See https://www.owid.de/docs/neo/listen/corona.jsp#impfdraengler (last access: 10 June 2022). 
26 See https://www.owid.de/docs/neo/listen/corona.jsptimpfvordraengler (last access: 10 June 2022). 
27 Source: “OWiDplusLIVE”, see https://www.owid.de/plus/live-2021/ (last access: 10 June 2022). 
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(also called UMTS). In the context of the Covid-19 pandemic, however, this abbrevi- 

ation refers to two different notions: 

a) three circumstances that should be avoided during the Covid-19 pandemic 
to avoid an increase of infections: geschlossene Räume (‘enclosed spaces’), 
Gedränge (‘crowding’) and Gespräche ohne Abstand ('conversations without 
distance’), 

b) three groups of people with special health status who pose a low risk of infec- 
tion in terms of contracting the SARS-CoV-2 virus: Getestete (‘tested persons’), 
Geimpfte (‘vaccinated persons’), Genesene (‘persons recovered from Covid-19"). 


Both senses are illustrated in the entry on 3G? in Neologismenworterbuch (recorded 
in June 2021). Since September 2021, the expression 2G” is used to refer to only two 
groups, namely vaccinated or recovered persons. By now, both abbreviations are 
already part of numerous German compounds (cf. Figure 3). Only the future will tell 
whether the new senses of 2G and 3G will possibly be used with a wider reference 
to methods of fighting an epidemic caused by any virus. 


"Stay home"-Sticker 200er-Inzidenz 3G-Modell 

"Stopp Corona"-App 200er-Notbremse 3G-Nachweis 

"Wir bleiben zuhause"-Sticker 2G 3G-Regel 

1-2-3-Regel 2G-Modell 4-Tage-R 
15-km-Beschránkung 2G-Option 50-Neuinfektionen-Marke 
15-km-Grenze 2G-Optionsmodell 501Y.V2 

15-km-Radius 2G-Regel 50er-Inzidenz 
15-km-Regel 2G-Regelung 50er-Marke 
15-km-Regelung 30-Sekunden-Regel 7-Tage-R 
15-Minuten-Regel 3G 


Figure 3: Extract from the list of neologisms around the Covid-19 pandemic 
in Neologismenwörterbuch showing compounds with 2G and 3G.?? 


4.4 New uses 


Another interesting example is the word (der/das) Coronavirus (noun, masculine or 
neutral gender), which has been attested in German since the 1980s in the context 
of AIDS research. Since then, the use of the word shows strong conspicuities in that 


28 See https://www.owid.de/docs/neo/listen/corona.jsp#3g (last access: 10 June 2022). 
29 See https://www.owid.de/docs/neo/listen/corona.jsp#2g (last access: 10 June 2022). 
30 See https://www.owid.de/docs/neo/listen/corona.jsp# (last access: 10 June 2022). 
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it does not continuously occur with approximately the same frequency or with a 
continuously increasing or decreasing tendency in the texts of the “German Refer- 
ence Corpus — DeReKo”, but it shows two prominent frequency peaks, namely in 
the years 2003 and 2013 (cf. Figure 4). The significantly more frequent occurrence 
of Coronavirus in 2003 is due to the SARS infection wave discussed at that time, and 
the high rate in 2013 is due to quite a few known MERS cases at that time. Of course, 
a next, much more significant peak will show for the years 2020 and 2021. This 
word is thus a good example of how current events of the day affect the frequency 
with which words are used. For Neologismenwórterbuch, however, the term Corona- 
virus itself is not a suitable candidate for inclusion as, on the one hand, it had been 
attested long before the beginning of the Covid-19 pandemic and, on the other 
hand, the dictionary does not cover any periods earlier than 1991 (see above). 


Relative Háufigkeit von Coronavirus in DEREKO-Pressekorpora 


(absolute Háufigkeit in Klammern) 


(1), - 0) (0) (0 2) (2) (1) 
(0) (0): (0) (0) 0) (0) (0) (0)-(0)-(0). (0) (0) (0) (0) (0) (0) (0) (0) (2) (2) (1) 


1985 1990 1995 


lorth-"[Cc]orona-?[vV]ir.*"] 


Figure 4: Relative frequency of Coronavirus in the “German Reference Corpus - DeReKo" between 
1980 and 2019 (with absolute frequency numbers in brackets).?* 


In contrast, the noun Corona as an abbreviation of Coronavirus is documented in Neo- 
logismenwörterbuch with a short entry” with three different senses (‘the SARS-CoV-2 
virus”, ‘the disease caused by SARS-CoV 2, i.e. Covid-19’, “the Covid-19 pandemic and 
the crisis caused hereby”) and other lexicographic information. In this case, the edito- 
rial team argued that there is no need to monitor the word further because of its fre- 
quency and communicative relevance. 


31 Graph by Mark Kupietz, project “Ausbau und Pflege der Korpora geschriebener Gegenwarts- 
sprache" at IDS Mannheim, see https://www.ids-mannheim.de/digspra/kl/projekte/korpora/ (last 
access: 10 June 2022). 

32 See https://www.owid.de/artikel/408108 (last access: 10 June 2022). 
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Finally, the entry of the noun (das) Homeoffice with its two senses ‘workspace 
at home” and “working from home” illustrates how new uses originating in the 
Covid-19 pandemic led to the revision of existing entries in Neologismenwórterbuch. 
Homeoffice in both senses is widely used in German since the mid-1990s, but the 
aspect of using modern telecommunication channels to do so was added to the def- 
initions in July 2020 and new phrases like im Homeoffice arbeiten (‘to work from 
home”) or the synonym Heimbúro ‘home office’ now supplement the entry due to 
their increased frequency in the Covid-19 pandemic. 

Over all, the list of new words and expressions around the Covid-19 pandemic in 
Neologismenwörterbuch contains predominantly single word entries (e.g. (das) Auto- 
konzert” ‘live performance of an artist, music group, or the like, where the audience 
listens to the music over a special radio channel while sitting in their car”) and not 
more than 10% are multi-word expressions (e.g. (die) Generation Corona,” “age 
group of young people particularly affected by the economic consequences of the 
Covid-19 pandemic, who have poor prospects of a successful start to their working 
lives due to the crisis”). More than 90% of the new lexemes are nouns (e.g. (der) Mas- 
komat” “vending machine (in front of a store) where individually packaged masks 
can be purchased”). Some adjectives (e.g. postpandemisch*® ‘concerning the period 
after the Covid-19 pandemic’) and (very few) verbs (e.g. teamsen” ‘to communicate, 
work, hold classes, etc., over the Internet using the Teams® video conferencing soft- 
ware”) make up for the rest. 

Most of the German corona-related vocabulary is formed following German 
word formation rules, i.e.: 

- compounding, e.g. Doppelmutante*® (‘(presumably) highly contagious variant 
of SARS-CoV-2 virus with two genome alterations’), from doppelt ‘double’ + 
(die) Mutante ‘mutant’, 

— derivation, e.g. downlocken?? (“to restrict economic and social activities (to con- 
tain an epidemic)’), derived from (der) Lockdown + suffix -en, 

- abbreviation, e.g. Delta^? (‘(presumably) highly contagious mutation of the 
SARS-CoV-2 virus, which was detected for the first time in India"), from (die) 
Deltavariante. 


33 See https://www.owid.de/docs/neo/listen/corona.jsp#autokonzert (last access: 10 June 2022). 
34 See https://www.owid.de/docs/neo/listen/corona.jsp#generation-corona (last access: 10 June 2022). 
35 See https://www.owid.de/docs/neo/listen/corona.jsp#maskomat (last access: 10 June 2022). 
36 See https://www.owid.de/docs/neo/listen/corona.jsp#postpandemisch (last access: 10 June 2022). 
37 See https://www.owid.de/docs/neo/listen/corona.jsp#teamsen (last access: 10 June 2022). 

38 See https://www.owid.de/docs/neo/listen/corona.jsp#doppelmutante (last access: 10 June 2022). 
39 See https://www.owid.de/docs/neo/listen/corona.jsp#downlocken (last access: 10 June 2022). 
40 See https://www.owid.de/docs/neo/listen/corona.jsp#delta (last access: 10 June 2022). 
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Only 5-10 % of the words on the list are English loanwords (e.g. (das) Weaning,“ 

“slow weaning of an intensive care patient from mechanical ventilation”). Com- 

pounding is the predominant word coining process (over 90%), while derivation or 

abbreviation are used only in a few cases (for examples see above). 

While the relation between single word and multi-word entries in Neologismen- 
wörterbuch as a whole is roughly the same (90%: 10%) and the proportions of parts 
of speech are similar as well (nouns: 83%, verbs: 8%, adjectives: 3 %, other: 6 %), 
the Neologismenwörterbuch as such contains approximately 61% of words that are 
formed in German, while 39% are loanwords. Those neologisms that are formed in 
German are compounds (70%), derivatives (21%), or abbreviations (9%). One rea- 
son for the deviation of the numbers for the corona-related neologisms in the list 
from the data of Neologismenwörterbuch as a whole is that the corona-related neo- 
logisms are still monitored. Thus, whole synonyms clusters are part of the inven- 
tory, for example: 

—  Coronateststation,*? Coronateststelle,“” Coronatestzentrum,“* COVID-19-Teststation,? 
COVID-19-Teststelle,*? COVID-19-Testzentrum,”” all denoting a mobile walk-in or 
drive-in facility for people to be tested for a possible infection with the SARS-CoV-2 
virus, or 

-  Immunitátsausweis,** Immunitátsnachweis,*? COVID-19-Ausweis,°° Covidpass,” 
digitaler Impfpass,” grüner Pass,” all designating an official document con- 
firming the immunity of a person with respect to the SARS-CoV-19 virus. 


As of October 2021, it remains unclear which lexeme of each group will eventually 
be the most dominant term and thus the most eligible candidate to be entered into 
the reference guide, while the less frequent lexemes are only treated inclusively 
under the corresponding qualified headword. For the lexicographic catalogue of 
monitored words of Neologismenwörterbuch, the general rule applies that semanti- 
cally transparent compounds or derivatives are explained in the dictionary, rule of 


41 See https: //www.owid.de/docs/neo/listen/corona.jspttweaning (last access: 10 June 2022). 

42 See https: //www.owid.de/docs/neo/listen/corona.jspttcoronateststation (last access: 10 June 2022). 
43 See https: //www.owid.de/docs/neo/listen/corona.jspttcoronateststelle (last access: 10 June 2022). 
44 See https: //www.owid.de/docs/neo/listen/corona.jspttcoronatestzentrum (last access: 10 June 2022). 
45 See https: //www.owid.de/docs/neo/listen/corona.jspttcovid-19-teststation (last access: 10 June 2022). 
46 See https://www.owid.de/docs/neo/listen/corona.jsp#covid-19-teststelle (last access: 10 June 2022). 
47 See https://www.owid.de/docs/neo/listen/corona.jsp#covid-19-testzentrum (last access: 10 June 2022). 
48 See https://www.owid.de/docs/neo/listen/corona.jsp#immunitaetsausweis (last access: 10 June 2022). 
49 See https://www.owid.de/docs/neo/listen/corona.jsp#immunitaetsnachweis (last access: 10 
June 2022). 

50 See https://www.owid.de/docs/neo/listen/corona.jsp#covid-19-ausweis (last access: 10 June 2022). 
51 See https://www.owid.de/docs/neo/listen/corona.jsp#covidpass (last access: 10 June 2022). 

52 See https://www.owid.de/docs/neo/listen/corona.jsp#digitaler-impfpass (last access: 10 June 2022). 
53 See https://www.owid.de/docs/neo/listen/corona.jsp#gruener-pass (last access: 10 June 2022). 
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thumb that is also less strictly applied for comprehensive and concise entries. The 
deviations in the data will be studied more closely in the future, when the develop- 
ment of usage and its manifestation can be examined in the IDS corpus data in 
more detail. 


5 Conclusion 


Neologism dictionaries seem to be predestined to react quickly to recent and possibly 
rapid developments of the lexicon of a given language, as it is their objective to cover 
new lexemes and phrases, a phenomenon naturally associated with change. How- 
ever, this can only be the case if the dictionary in question aims at documenting new 
words close to their moment of creation, taking into account the risk that some of the 
neologisms recorded may only be short-lived. In contrast, retrospective neologism 
lexicography runs the risk of offering details about new words at a time where the 
needs of users for this information have decreased and the words in question may no 
longer be considered as novel. In a situation like the Covid-19 pandemic, where the 
lexicon of languages around the world (including German) has expanded to a large 
extent in a very short amount of time and the usage frequency of some words has 
changed perceptibly in contrast to former times, this question has accumulated even 
greater importance. In this study, the solution found for Neologismenwórterbuch of 
IDS was demonstrated, i.e., by offering both types of information in one online refer- 
ence work. On the one hand, fully described neologisms in comprehensive entries 
are being compiled for words that emerged in German in a specific period of time of 
language development (the 1990s, the 2000s, the 2010s, the 2020s) and which have 
diffused and are now generally accepted; and on the other hand, some headwords 
are still monitored and are being collected, receiving only brief and essential seman- 
tic specifications. 

Work in the Neologismenwörterbuch project exceeded lexicographic practice by 
additionally offering corpus-based, evidence-based research results on neologisms 
around the Covid-19 pandemic to the public. The editorial team regularly published 
short essays on specific word groups (between April 2020 and August 2021) and got 
into contact with dictionary users by email correspondences based on an online 
word proposal form as part of the dictionary. The editors also gave many interviews 
to newspapers, magazines, radio, and TV stations”* that followed the developments 
and the emergence of new coinings around the Covid-19 pandemic with great inter- 
est. Besides regular continuous editorial work, all these activities meant additional 


54 See https://www.ids-mannheim.de/aktuell/presse/online-presse/ for the press review of IDS, 
last access: 10 June 2022. 
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workloads for a rather small team of lexicographers,” but which helped raising the 
general awareness for the lexicographical work at IDS and for the usefulness of reli- 
able, up-to-date dictionaries, making it worthwhile to immerse in lexicography “at 
the pulse of time”. 
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Kilim Nam, Jinsan An, Hae-Yun Jung 

The emergence and spread of Korean 
COVID-19 neologisms in news articles and 
user comments and their lexicographic 
description 


1 Introduction 


COVID-19-related new words have been coined extensively since December 2019, re- 
flecting linguistic response to a social reality and indicating the dynamics of new 
words to cope with unprecedented pandemic situations. From an internal linguistic 
perspective, COVID-19 neologisms are first of all a set of new words which are fo- 
cused on a specific period and a specific topic and show particular tendencies in 
terms of grammar and semantics. Not only do they hold great interest in word- 
formation research, but they also shed light on the relationship between language 
and discourse communities as they reveal the impact of the pandemic and its per- 
ceptions by the public. 

There has been much discussion on vocabulary reflecting social and cultural 
phenomena in relation to lexicographic research. A few decades ago, Williams 
(1976) published a dictionary of cultural keywords, leading to the development of 
the ‘Keywords Project’,' which studies diachronic changes of meaning and syn- 
chronic meanings of major words. As for Wierzbicka (1999), vocabulary is key to 
understanding history, culture and society and keywords are evidence that lexicol- 
ogy and lexicography play a central role in interpreting discourse communities.? 

Positing that the COVID-19 neologisms as a class of vocabulary constitute the 
keywords of the COVID-19 era, this study aims to analyze the occurrence and usage 


1 Details of the project can be found on https: //keywords.pitt.edu/ (last access: 10 June 2022). 

2 On the other hand, there are also examples of studies on the relationship between certain lexical 
classes and discourse communities using the quantitative analysis of corpus linguistics, such as 
Stubbs (1996, 2001) who discussed how collocations and the semantic preferences of certain words 
could evidence cultural and ideological characteristics, and Scott (2010) who discussed the method- 
ology as well as the importance of keyword extraction based on statistical significance. 
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of such keywords and provide suggestions for their lexicographic representation, 
from the perspective of corpus linguistics. While research on the sociolinguistics of 
neologisms or the correlation between culture and neology has focused hitherto on 
particular types of neologisms and sociocultural phenomena, the use patterns of 
neologisms depending on genres and registers has not been fully discussed. How- 
ever, the proliferation of Web languages and the dynamics of language resulting 
from mass communication call for the study of neologisms in relation to genres and 
registers. Thus, this study investigates the neologisms extracted not only from on- 
line news articles but also from the comments accompanying such articles. Com- 
ments written by non-experts are indeed equally crucial to examine as the articles 
written by experts since their respective values for analysis differ in terms of pro- 
duction and use of neologisms. 

The present paper examines the usage of 341 COVID-19 neologisms which ap- 
peared in South Korea over a span of eighteen months (from December 2019 to 
May 2021) and were extracted from a corpus composed of COVID-19-related news ar- 
ticles and comments, the COVID-19 Corpus, in order to address the following research 
questions: 1) How do the 341 COVID-19 neologisms extracted rank in news articles 
and comments respectively?, 2) What usage trends do neologisms designating the 
disease and other high-frequency neologisms show in news articles and comments 
respectively?, 3) What characteristic differences do comments as a non-expert and 
subjective language resource and news articles as an expert and objective language 
resource show and what value may each genre add to the lexicographic description 
of neologisms? 

The following section introduces the composition of the COVID-19 Corpus and 
research methodology. Section 3 provides a quantitative analysis of the COVID-19 
neologisms and Section 4 is a case study of the usage trends of the high-frequency 
neologism K-pangyek? ‘K-quarantine’. Finally, Section 5 discusses a number of is- 
sues regarding the lexicographic description of such neologisms as K-pangyek. 


2 Object of study and methodology 


2.1 Extracting COVID-19 neologisms and building 
the COVID-19 Corpus 


This paper targets 341 neologisms related to COVID-19, which were coined between 
December 2019 and May 2021. This list combines the lists of COVID-19 neologisms 
presented in Lee/Kang/Nam (2020) and Nam et al. (2021), which were expanded with 


3 This paper follows the Yale romanization system in transcribing Korean words. 
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neologisms extracted manually.* Lee/Kang/Nam (2020) and Nam et al. (2021) slightly 
differ in the composition of the corpus used for extraction, the time span for manual 
extraction, and the identifying criteria. 


Table 1: Comparison of the extraction of COVID-19 neologisms. 


Lee/Kang/Nam (2020) Nam et al. (2021) 
methodology extraction using a domain-general and extraction using a domain-general 
domain-specific newspaper corpus newspaper corpus 
identifying - occurrence timescale: - occurrence timescale: July 2019 — 
criteria December 2019 — August 2020 June 2020 
- occurrence frequency: 1 or more - occurrence frequency: 3 or more 
neologisms 302 229 


extracted 


As seen in Table 1, Lee/Kang/Nam (2020) made use of domain-specific newspapers 
containing the character strings kholona “corona” and khopitu ‘COVID’ in addition 
to domain-general newspapers and extracted as many neologisms as possible by 
lowering the minimum occurrence frequency to 1. Nam et al. (2021), on the other 
hand, collected 405 new words for the year 2020 using only a domain-general news- 
paper corpus, from which 229 new words directly or indirectly related to COVID-19 
were selected and classified as COVID-19 new words. Both studies extracted neolo- 
gisms from news articles only and similarly excluded controversial slang and dis- 
criminatory expressions because the scope of the research was set as an extension 
of a state-led neologism research project in both cases. 

The present study combines the two lists from the above studies, to which were 
added expressions with pejorative connotations, such as cwungkwuk pyeylyem 'Chinese 
pneumonia”, ccangkkay pyeylyem ‘Chinky pneumonia’, ccangkkay kholona *Chinky co- 
rona', as well as phrases that appeared after August 2020, including wulye pyeni “fear 


4 The extraction of neologisms is carried out in four stages. First, a corpus is built by crawling news 
articles on the Naver platform (https://www.naver.com/) within a specific time span and a specific 
scope of media. A list of neologism candidates is compiled by extracting nouns from the corpus 
based on probability and by extracting inflected and uninflected words via morphological analysis. 
The list is then reviewed to determine whether the word forms extracted are already listed as head- 
words in the main online dictionary Urimalsaem. The date of first occurrence is checked in Naver 
News (https://news.naver.com/) and the word forms that meet the identifying criteria are compiled 
as neologisms. In addition, manual extraction is carried out in parallel for a more comprehensive 
extraction. 

5 Korean uses the abbreviated form corona for coronavirus to designate the virus and the disease, 
and often to create COVID-19 related neologisms. 
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transition”, pwusuthe syos ‘booster shot’, popoksopi ‘revenge spending”, payksin cengchi 
“vaccine politics”, and pangyek cengchi 'quarantine politics”. 

Table 2 summarizes the composition of the COVID-19 Corpus used to analyze 
the usage of COVID-19 neologisms. 


Table 2: Composition of the COVID-19 Corpus. 


Article Corpus Comment Corpus 

genre COVID-19 related news COVID-19 related news (comments on 
(news articles) news articles) 

time scale January 1%, 2020 - March 31°, 2021 (15 months in total) 

target Articles and comments of news pages wherein the word kholona 


(‘corona’ for COVID-19) appears at least 3 times® 


number of items 456,680 articles 13,470,427 comments 
number of ecel (Korean 122,052,029 159,506,286 
word units) 


The COVID-19 Corpus is divided into two sub-corpora, one composed of news ar- 
ticles (“Article corpus”) and the other consisting of the readers” comments on each 
article (Comment corpus”). The latter is much larger with regard to both the num- 
ber of items and the amount of text than the former. The articles selected to build 
the corpus include news articles such as reports on the development of the pan- 
demic (1.a) and articles on the impact of the pandemic on individual lives as well as 
culture and society (1.b, 1.c). 


(1) a. [Breaking news] 465 domestic contacts with confirmed cases, 70 tests per- 
formed, 12 confirmed cases of new coronavirus 
b. [Headline] Film industry in crisis together with the new coronavirus ... 
The release of The Princess delayed. 
c. Let's go to the sensible “stay at home” online playground! 


While news articles constitute a genre written by experts and characterized by objec- 
tivity, readers’ comments form a genre produced by non-expert and characterized by 
subjectivity, thus containing personal opinions, feelings, and any additional informa- 
tion readers want to share, rather than objective facts. Examples (2) and (3) show 


6 Once the corpus was built, some of the COVID-19 neologisms were pre-processed by unifying spell- 
ings, removing unnecessary spaces and new lines, and removing special symbols (e.g., k-quarantine, 
k quarantine, K-quarantine, K-quarantine). 
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typical examples from articles and comments that are concerned with the “emergency 
disaster relief fund” (2) and ‘corona-blues’ (3). 


(2) a. The government has announced a policy of “emergency disaster relief 
fund’ for the bottom 70% of taxpayers. 
b. Think about applying the emergency disaster relief fund where it’s really 
needed, instead of giving it out to the whole country. 


(3) a. The term ‘corona-blues’ was even coined in reference to the ‘depressed’ 
meaning of ‘blue’. 
b. Hopefully it will be less tomorrow? It's really corona-blues. . . I’m getting 
so depressed :( 


As can be seen in the above Examples, (2.a) and (3.a), which have been extracted 
from the ‘Article corpus’, aim to convey objective information such as reporting on 
a governmental policy or defining a neologism, whereas the comments (2.b and 3.b) 
convey the readers’ personal opinions and attitudes on the matter. 

Up until now, the extraction and description of Korean neologisms have mostly 
focused on news articles. This study, however, aims to examine comments made by 
the public too and to compare them to the articles. This would allow us to see how 
the public’s response to the pandemic is expressed in language. Moreover, dictio- 
nary descriptions of the COVID-19 neologisms need to represent people’s creative 
language use patterns beyond objective facts such as reports or promotion of na- 
tional policies. 


2.2 Methodology 


In order to analyze the usage patterns of our 341 COVID-19 neologisms, we have 
built both the ‘Article corpus’ and ‘Comment corpus’ as monthly corpora, classified 
the neologisms into a semantic category system, and carried out the usage trend, 
collocation and n-gram analyses. 

The first step is to look into the quantitative characteristics of the 341 neologisms 
by calculating the overall use frequency of each COVID-19 neologism for each month 
so as to determine the usage trend of each neologism since its first occurrence. As 
will be discussed in section 3, it appears that neologisms that have long gone out of 
use in news articles still appear frequently in the comments, which leads us to ques- 
tion whether it is possible to perform comprehensive frequency analysis for neolo- 
gisms. The neologisms are classified into twelve semantic categories, including 
economy, society, education, health, and religion. The analysis of the main semantic 
domains sheds light on the semantic characteristics of COVID-19 neologisms, which 
turn out to be different from the characteristics of previous years’ neologisms. 
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The focus is then brought on 35 high-frequency neologisms occurring in both 
articles and comments, for which quantitative and qualitative analyses are carried 
out, and the case of the neologism K-pangyek ‘K-quarantine’ in particular is exam- 
ined to explore the applicability of the results in dictionary descriptions. The case 
study of K-quarantine entails a three- dimensional analysis of the characteristics of 
the COVID-19 neologisms rather than a simple statistical analysis of the collocates 
for a particular expression. In other words, we analyze the characteristics of the 
COVID-19 neologisms by analyzing n-grams and secondary collocates co-occurring 
with the node-collocate pairs in addition to the primary collocate analysis. 

Primary collocates and secondary collocates form a network-based method for 
collocation analysis as proposed by Brezina et al. (2015), which consists of expand- 
ing gradually the existing node A > collocate B relationship into a network of node 
A 2 collocate B > collocate C of collocate B and so on.’ This method allows a better 
and richer understanding of a text's context. 


(4) a. He also expressed his aspiration to build a new South Korea by continuing 
the success of K-quarantine, which has become a global standard. 
b. The great K-quarantine... A disgusting regime that takes credit for its 
success by using its people as guinea pigs. 


K-quarantine and success frequently co-occur in both articles (4.a) and comments (4.b). 
However, it is not possible to pinpoint the usage patterns of K-pangyek ‘K-quarantine’ 
in the media by solely examining the primary collocate success. A closer look on sec- 
ondary collocates shows that the pair K-quarantine-success often co-occur with words 
such as global, standard, plan in articles, while it tends to co-occur with expressions 
such as disgusting, take credit for (all collocates are highlighted in bold in the exam- 
ples) in comments. Furthermore, by setting the part of speech of secondary collocate to 
adjectives, the extraction of evaluation and sentiment expressions for a given COVID- 
19 neologism becomes all the more effective. Accordingly, the extraction criteria for pri- 
mary and secondary collocates are shown in Table 3. 

In addition, we performed an n-gram analysis, i.e. an analysis strings of high- 
frequency morphemes from 5-grams to 10-grams, to examine the typical context of 
the neologism and understand its usage context beyond the morpheme and word 
unit. The n-gram analysis was mainly based on strings with verbs and adjectives as 
heads. 


7 Brezina et al. (2015) also introduce Graphcoll, a visualization tool for collocation analysis. A rep- 
resentative study using Graphcoll is Baker (2016) which explains the network and graph theories 
making the basis of Graphcoll and discusses the necessity of secondary collocate analysis by com- 
paring the results of the primary collocate analysis for the word troops using AntConc and Word- 
Smith with the results yielded by the secondary collocate analysis using Graphcoll. See https:// 
www.futurelearn.com/info/courses/corpus-linguistics/0/steps/104876 (last access: 10 June 2022). 
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Table 3: Extraction criteria for primary and secondary collocates. 


Co-occurrent target 


Co-occurrence 


window 
Primary common noun (NNG), proper noun (NNP), verb (VV), +2 
collocates adjective (VA), adverb (MAG) 
Secondary adjective (VA) sentence or whole 
collocates comment 


3 Characteristics of high-frequency COVID-19 
neologisms and semantic categories 


Although this section examines the overall characteristics of the 341 neologisms 
collected, the focus is on high-frequency neologisms appearing in articles and com- 
ments respectively. Table 4 shows the statistical characteristics of the 341 COVID-19 
neologisms by semantic categories.® 


Table 4: Examples of COVID-19 neologisms classified by semantic categories. 


Semantic category Number Ratio Examples 

Politics and 114 33.4 kinkupcaynanciwenkum ‘emergency disaster relief fund’, 

Administration thulaypulpepul “travel bubble”, kongcekmasukhu “official 
mask” 

Health and Medicine 75 22.0 pwusuthesyos ‘booster shot’, tolphakamyem 
“breakthrough infection”, keyicintankhithu ‘K-testing kit’ 

Lifestyle 51 15.5 kholonaihon ‘corona-divorce’, payksinhyuka ‘vaccine 
holiday”, cipkhoksitay ‘stuck-at-home era’ 

Economy 33 9.7 kholonomisyokhu ‘coro[navirus] [eco]nomy shock’, 
popoksopi ‘revenge spending’, khyukhonomi ‘Q 
[uarantine]-[eco]nomy’ 

Society 23 6.7 enthaykthusahoy ‘un-[con]tact society’, 


tekpwuneychayllinci ‘thank you challenge’, 
caytheylkunmwu ‘working-at-home in hotels’ 


8 For the discussion on the semantic category system for the COVID-19 neologisms, see Lee/Kang/ 


Nam (2020: 158-161). 
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Table 4 (continued) 


Semantic category Number Ratio Examples 


Human beings 13 3.8 hwakccinja ‘confirmed fatter case”, homomasukwusu 
“homo maskus” 


Education 11 3.2 kholonakheyisyen *corona-[va]cation', enthaykthulening 
‘un-[con]tact learning’ 


Clothing 8 2.3 epheweye 'upperwear, phaynthikhoting ‘panty coding’ 


Food 6 1.8 talkonakhephi ‘dalgona coffee”, tolpaptolpap ‘cooking 
after cooking’ 


Culture 5 1.5 enthaykthumwunhwa ‘un-[con]tact culture”, 
laynsenumakhoy “LAN line concert” 


Housing 1 0.3 pitaymyencwukemwunhwa ‘non-face-to-face housing 
culture’ 

Religion 1 0.3 ciphapyeypay ‘group Mass’ 

Total 341 100 


This semantic category system for neologisms was established in 2015 by the 
Korean Neologisms Investigation Project, which had been conducted every year 
from 1994 to 2019 under government supervision (Nam 2019). The classification of 
neologisms into twelve semantic categories, including society, education, economy, 
and politics, shows which domains are the most conducive to the creation of neo- 
logisms each year and hence each period. 

As shown in Table 4, the predominant domains for COVID-19 neologisms are 
Politics and Administration (33.4%) and Health and Medicine (22%), the two com- 
bined accounting for more than half of the neologisms, followed by Life (15.5%), 
Economy (9.7%), and Society (6.7).? In contrast, the predominant semantic catego- 
ries of 2015, for example, were Society and Economy. Back then, common neolo- 
gisms related to major economic and social issues, while COVID-19 neologisms are 
primarily concerned with various welfare policies and health matters. In other 
words, neologisms have been abundantly created in the fields of politics and medi- 
cine as a result of the pandemic crisis.’° 


9 The trends for these semantic categories have been similarly discussed in Lee/Kang/Nam (2020), 
which targeted 302 COVID-19 neologisms. 

10 The proportion of COVID-19 neologisms in the category of Politics and Administration is partic- 
ularly meaningful considering that this semantic category used to account for only around 5% of 
the neologisms each year according to the Korean Neologisms Investigation Project. 
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This is also evident in the high-frequency neologisms listed in Table 5. These 
neologisms are presented by corpus, ranked from 1 to 35 in order of high frequency, 
and correspond to 10% of the total 341 COVID-19 neologisms. 

All of the top 35 neologisms listed Table 5 are terms either designating the 
virus, such as kholona19 ‘corona-19’, sincongkholonapailesukamyemcung ‘novel co- 
ronavirus disease”, and wuhanpyeylyem ‘Wuhan pneumonia”, or referring to welfare 
measures, such as sahoycekkelitwuki ‘social distancing”, saynghwalsokkelitwuki ‘or- 
dinary social distancing’, kwukmincaynanciwenkum ‘national disaster relief fund’ 
and kinkupcaynanciwenkum “emergency disaster relief fund’. 

In addition, 22 neologisms (highlighted in grey) are common to both news ar- 
ticles and comments. However, depending on whether they are from a news article 
or a comment, they differ in frequency ranking, usage trends, and the distribution 
of their collocates (this will be discussed in Section 4 in more detail). 

The unmarked neologisms are those which are found in one genre exclusively 
among its top 35 neologisms, either because they are used only in one genre across 
the whole sub-corpus or because they made it to the top 35 in one genre only. In the 
latter case, it is worth examining the reasons why they show such differences in 
rank and frequency. 

For example, kholonapailesukamyemcung-19 ‘coronavirus disease 19 ranks 17" 
in the Article corpus but only 76™ in the Comment corpus; on the other hand, kho- 
pitu-19 ‘COVID-19’ ranks 23? in the comments but is down to 58" in articles. These 
examples show the tendency to use official terms in articles, while shorter forms are 
preferred in comments. Furthermore, we found a few nicknames to designate the dis- 
ease in high frequency in the comments (taykwukholona “Daegu corona”, cwungkwuk- 
phyeylyem ‘China pneumonia”, cwungkwukkholona ‘China corona”, taykwuphyeylyem 
“Daegu pneumonia”, ccangkkayphyeylyem ‘Chinky pneumonia”), which belong for the 
most part to hate speech against specific countries or regions (China, Daegu) and 
even social groups such as Shincheonji members (e.g. sinchenciphyeylyem ‘Shin- 
cheonji pneumonia).” Hate and discrimination speech is highly controversial and 
often banned from newspapers. Although these constitute only a few cases, the use 
of people’s comments rather than the official terms in news articles for language de- 
scription presents the advantage of showing the raw usage of language. Nonetheless, 
it also raises the issue of how to assess and deal with users’ unrefined language as 
well as discrimination and hate discourses. If expressions belonging to discrimina- 
tion or hate speech are to be represented in the dictionary, lexicographers need to 
assess the necessity of their inclusion in (or exclusion from) dictionaries. But before 
raising the question of lexicographic inclusion of such terms, lexicographers need to 


11 Shincheonji is a religious sect, the members of which are believed to have acted as super- 
spreaders of COVID-19 in February 2020 in Daegu, whereas only a couple of cases had been de- 
tected in the entire country at that time. 
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define the boundaries between discrimination and hate expressions from a legal per- 
spective and from a linguistic perspective. 

Some technical terms that are highly frequent in the Article corpus, such as on- 
thaykthu *on[line] [con]tact’ or welfare policy names (kinkupkoyongancengciwenkum 
“emergency employment security fund”, kikansanepanjengkikum * . . . industry stabil- 
ity fund’), are ranked quite low in the Comment corpus (80'*, 156% and 151º respec- 
tively), while blend words, such as theksukhu ‘chin [ma]sk' or khosukhu ‘nose [ma] 
sk’, show the opposite tendency, ranking 11% and 32"4 respectively in the comments 
but only 65% and 138" respectively in the articles. Another case in point is kholona- 
cengchi ‘corona politics’, which is highly frequent in comments (6) compared to ar- 
ticles (119%), Kholonacengchi ‘corona politics’ is a sarcastic term to designate the use 
of COVID-19 for political purposes. These examples show that in comments, not only 
are creative neologisms more commonly used but they also tend to convey language 
users’ opinions and assessments of the COVID-19 situation. 

It is worth noting that the same neologism may also exhibit different character- 
istics in terms of usage depending on whether it is used in news articles or in the 
comments. For instance, in the articles, theksukhu ‘chin [ma]sk’ and khosukhu ‘nose 
[ma]sk' are used when defining the neologism and in reference to related policies 
and measure promotion (5), whereas in the comments, the terms are used to ex- 
press one’s attitudes, opinions and/or feelings regarding theksukhu- and khosukhu- 
related phenomena (6). 


(5) a. ‘Chin-mask’ and ‘nose-mask’ still out there, further action needed beyond 
regulations. 
b. The government has warned against the so-called ‘nose-mask’, whereby 
people wear masks with their noses exposed. 
c. Mask or no mask, Seoul cracks down on ‘chin-mask’ and ‘nose-mask’. 
(6) I want to punch everyone who wears chin-mask and nose-mask. 
b. People wearing chin-mask, nose-mask or no mask, behave yourself. 
c. Ireally hate seeing people wearing chin-mask or nose-mask these days. 


E 


These examples show that the usage characteristics of neologisms are obtained all 
the more clearly with concordance and collocate analysis. As the case of theksukhu 
and khosukhu can be extrapolated to many other neologisms, it becomes apparent 
that language resources such as comments may provide useful data for lexicogra- 
phy research as they show aspects of usage in real-life contexts. This is all the more 
crucial in the case of neologisms because their dynamicity requires to bring the 
focus on actual usage. 

As discussed above, when handling language data such as people's comments 
in addition to news articles, it is necessary to adopt a comprehensive approach to 
the extraction and description of COVID-19 neologisms by considering changes in 
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the list of headword candidates, different frequency rankings and different usage 
characteristics according to varying frequencies across genres. In line with such con- 
siderations, the following section provides a detailed, qualitative study of K-pangyek 
‘K-quarantine’. 


4 Case study: K-pangyek ‘K-quarantine’ 
4.1 Primary collocates of K-pangyek 


K-pangyek ‘K-Quarantine’ ranks 11th in high frequency for the Article corpus and is 
the most frequently used neologism in the Comment corpus. The term first appeared 
in February 2020 and was coined in relation to ways of dealing with the COVID-19 
outbreak. The letter ‘K’ refers to Korea (i.e., South Korea’s quarantine system). In the 
early stages of the pandemic, while most countries struggled with quarantine, South 
Korea established an effective quarantine system using a ‘drive-through/walk-through 
screening stations’ and ‘live-in treatment centers’, for which it was considered a role 
model for the international community. However, the evaluation of the K-quarantine 
may vary greatly depending on the timing of evaluation, but above all, on the medium 
where it is evaluated, sometimes resulting in contradictory assessments. Indeed, fac- 
tors such as the social class of a given newspaper’s readership and the speaker’s polit- 
ical orientation may exert great influence. 

From a quantitative perspective, K-pangyek occurred 7,920 times in the Article 
corpus and 103,659 times in the Comment corpus from January 2020 to March 2021. 
Monthly use trends are shown in Figure 1. 

The frequency trends of K-pangyek show a very strong correlation with the 
trends in the number of confirmed cases of COVID-19. Summer 2020 began with 
the second wave of the pandemic in South Korea, the number of infected persons 
increasing throughout the summer and even doubling from May to June 2020. In 
parallel, the neologism hit peaks in June and August 2020. The highest peak (for 
both the neologism and the confirmed cases) was in December 2020, which coin- 
cides with the third wave. These trends manifest the failure of the K-quarantine 
model, fueling fierce criticisms in comments. The third wave, in particular, sparked 
off strong criticisms on the delay in supplies and inoculations of the COVID-19 vac- 
cine compared to other countries. 

To understand the differences in the overall usage of K-pangyek, we extracted 
the 30 most frequent primary collocates occurring within an L2R2 window from 
both article and comment examples, targeting only content words. Thus, these col- 
locates include common nouns (NNG), proper nouns (NNP), verbs (VV), adjectives 
(VA), and adverbs (MAG), which are presented in Table 6 in both absolute frequen- 
cies and normalized frequencies (1 per 100,000 ecel). 
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Rank Articles Comments 

Morpheme Tag Abs. Norm. Morpheme Tag Abs. Norm. 
fr. fr. fr. fr. 

1 sengkwa NNG 473 334 ha‘do’ VV 10768 475 
‘achievement’ 

2 seykyey ‘world’ NNG 454 321 A ‘haha’ NNG 9596 424 

3 hongpo NNG 447 316 calang ‘boast’ NNG 3499 154 
‘promotion’ 

4 sengkong “success” NNG 359 254 seykyey ‘world’ NNG 3338 147 

5 cengpwu NNG 357 252 hongpo NNG 3293 145 
‘government’ ‘promotion’ 

6 kwukmin ‘citizen’ NNG 313 221 calangha ‘boast’ VV 3218 142 

7 cheykhu ‘check’ NNG 290 205 kwukmin ‘citizen’ NNG 3041 134 

8 phaykthu ‘fact’ NNG 289 204 cengpwu NNG 2760 122 

‘government’ 
9 moteyl ‘model’ NNG 285 201 mwunjayin ‘Moon NNP 2708 120 
Jae-in’ 

10 kholona-19 NNG 265 187 cahwacachanha VV 2568 113 
*COVID-19' ‘self-praise’ 

11 hankwuk ‘South NNP 228 161 pangyek NNG 2516 111 
Korea’ ‘prevention’ 

12 toy ‘become’ VV 223 158 eps ‘not be’ VA 2457 108 

13 is “link” VV 205 145 payksin ‘vaccine’ NNG 2424 107 

14 kwukcey NNG 195 138 calha ‘do well’ VV 2346 104 
‘international’ 

15 tayha ‘about’ VV 184 130 iss ‘be’ VA 2282 101 

16 mopem ‘role NNG 177 125 cahwacachan ‘self- NNG 2143 95 
model’ praise’ 

17 wuswuseng NNG 176 124 silphay ‘failure’ NNG 2090 92 
“excellence” 

18 pangyek NNG 174 123 toy ‘become’ VV 2054 91 
“prevention” 

19 ha ‘do’ VV 172 122 ka‘go’ VV 2052 91 

20 cen “whole” NNG 163 115 kath ‘similar’ VA 1971 87 

21 pyocwun ‘standard’ NNG 149 105 silchey ‘true colour’ NNG 1953 86 
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Table 6 (continued) 


Rank Articles Comments 
Morpheme Tag Abs. Norm. Morpheme Tag Abs. Norm. 
fr. fr. fr. fr. 
22 kyenghyem NNG 148 105 kholona ‘corona’ NNG 1783 79 
‘experience’ 
23 pat ‘receive’ VV 147 104 ttay ‘when’ NNG 1615 71 
24 ceyphwum NNG 144 102 ecce ‘how’ VV 1610 71 
‘product’ 
25 wiki ‘crisis’ NNG 143 101 ttetul‘make noise” VV 1594 70 
26 taythonglyeng NNG 139 98 an ‘not’ MAG 1593 70 
‘president’ 
27 calangha ‘boast’ VV 137 97 choyko ‘best’ NNG 1483 65 
28 kholona 'corona' NNG 136 96 honpoha ‘promote’ VV 1465 65 
29 tto ‘too’ MAG 124 88 sengkong ‘success’ NNG 1462 65 


30 eps “not be” VA 119 84 soli ‘noise’ NNG 1457 64 


Exclusive collocates that appear as primary collocates only in one genre have 
been highlighted in grey. These are particularly useful in showing the various eval- 
uation tendencies in articles and comments. In the case of articles, exclusive key- 
words, such as sengkwa ‘achievement’, moteyl ‘model’, wuswuseng ‘excellence’, or 
pyocwun ‘standard’, show a positive evaluation of K-quarantine, while in the com- 
ments, evaluations appear rather negative with frequent collocates such as cahwa- 
cachanha ‘self-praise’, silphay ‘failure’, silchey ‘true colour’, and ttetul ‘make noise’. 
As can be seen in the examples below, exclusive collocates are representative of the 
evaluation of the K-quarantine made either in articles or in comments. Example (7) 
shows the typical case of the collocate moteyl ‘model’ in articles, and (8) the typical 
case of the collocate ‘Moon Jae-in'? in comments. 


(7) a. The K-quarantine model as a reliable shield against COVID-19, has been 
shared with the whole world. 
b. After achieving global recognition with its drive-through screening sta- 
tions and testing kits, the K-quarantine model has also yielded impressive 
results. 


12 Examples with the collocate “Moon Jae-in’ are generally “negative evaluation”; however, there 
are a few cases of positive evaluation, such as: Thumbs for President Moon Jae-in’s ‘K-quarantine’ 
that is recognized around the world! All Koreans respect it and support it. 


The emergence and spread of Korean COVID-19 neologisms in news articles — 59 


c. Singapore introduces the ‘drive-through screening station’ of the K-quar- 
antine model. 


(8) 


p 


K-quarantine has failed because of Moon Jae-in. 

b. Moon Jae-in called every country to brag about sharing the secret of 
K-quarantine but look what it's become now. 

c. Moon Jae-in has spent 120 billion won to promote a K-quarantine that 

does not even work. And then he had no more money to buy vaccines. 


In addition to the exclusive collocates, the top 30 most frequent collocates also in- 
clude common collocates, such as ‘cengpwu ‘government’, sengkong ‘success’, or 
kholona “corona”, and most likely, other exclusive collocates that did not rank in 
the top 30. Therefore, a clear difference between articles and comments cannot be 
asserted categorically with the sole primary collocates presented above. This re- 
quires instead additional quantitative analysis, including in particular a primary 
collocate analysis with a wider window. Moreover, the narrow L2R2 window suffers 
from the shortcoming that adjectives (VA), which directly express evaluations and 
emotions, are left out of the primary collocation lists. Therefore, it is necessary to 
carry out a secondary collocate analysis as well as an n-gram analysis to cope with 
such shortcomings. The next section first explores the co-occurrences of two com- 
mon primary collocates of K-pangyek, namely cengpwu ‘government’ and payksin 
*vaccine' (which has not made it to the top 30 in the case of articles), by analyzing 
entire sentences and/or comments, and additionally performs a 5-gram analysis. 
Figure 2 presents the analytic process for co-occurrences of K-pangyek, focusing on 
the two cases of cengpwu and payksin. 


primary secondary 
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| ‘become a global standard’ 
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Figure 2: Analytic process for the co-occurrences of K-pangyek. 
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4.2 Secondary collocates and n-grams for K-quarantine 
4.2.1 Cengpwu ‘government’ 


First, let's examine the secondary collocates for the K-pangyek-cengpwu pair. These 
collocates target all the adjectives (VA) in sentences where K-pangyek and cengpwu 
are co-occurring, the extraction scope being extended to the whole sentence or 
comment. Table 7 presents the top 20 secondary collocates in order of high fre- 
quency, showing both absolute frequencies and normalized frequencies (1 per 
10,000 ecel). 


Table 7: Top 20 secondary collocates/cengpwu ‘government’ (NNG). 


Rank Articles Comments 
Morpheme Tag Abs. Norm. Morpheme Tag Abs. Norm. 
fr. fr. fr. fr. 
1 iss ‘be’ VA 192 75 eps ‘not be’ VA 4714 89 
2 eps ‘not be’ VA 94 37 iss ‘be’ VA 3416 64 
3 manh ‘many’ VA 44 17 kath “similar VA 1727 33 
4 ppalu ‘swift’ VA 33 13 mwunungha VA 1268 24 
“incompetent' 
5 kath ‘similar’ VA 29 11 kuleh ‘like that’ VA 992 19 
6 khu ‘big’ VA 26 10  ileh ‘like this’ VA 852 16 
7 saylop ‘novel’ VA 20 8 coh ‘good’ VA 663 13 
8 elyep ‘difficult’ VA 20 8 manh ‘many’ VA 624 12 
9 nop ‘high’ VA 20 8 hansimha ‘pathetic’ VA 462 9 
10 kanungha ‘possible’ VA 20 8 ecceh ‘how’ VA 430 8 
11 pwucokha VA 19 7 himtul ‘strenuous’ VA 422 8 
‘insufficient’ 
12 kuleh ‘like that’ VA 19 7 khu ‘big’ VA 390 7 
13 sinsokha ‘prompt’ VA 18 7 nuc ‘late’ VA 323 6 
14 coh “good” VA 17 7 cwungyoha VA 299 6 
‘important’ 
15 cwungyoha VA 16 6 taytanha VA 268 5 
‘important’ ‘impressive’ 
16 phylyoha ‘necessary’ VA 15 6 cek ‘few’ VA 242 5 


17 ecceh ‘how’ VA 14 6 kanungha ‘possible’ VA 222 4 
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Table 7 (continued) 


Rank Articles Comments 
Morpheme Tag Abs. Norm. Morpheme Tag Abs. Norm. 
fr. fr. fr. fr. 

18 ileh “like this” VA 13 5 calna VA 207 4 
‘distinguished’ 

19 wanpyekha ‘perfect?’ VA 13 5 silh ‘dislike’ VA 196 4 

20 chelceha ‘thorough’ VA 12 5 phylyoha VA 193 4 
‘necessary’ 


The exclusive secondary collocates (highlighted in grey) presented in Table 7 
show a clearer article-comment divide in the evaluation of the K-quarantine. Ar- 
ticles tend to use more positive adjectives, such as ppalu ‘swift’, saylop ‘novel’, nop 
‘high’, wanpyekha ‘perfect’ or chelceha ‘thorough’, while comments tend to contain 
adjectives conveying rather negative evaluation, including mwunungha ‘incompe- 
tent’, hansimha ‘pathetic’, himtul ‘strenuous’, nuc ‘late’, and silh ‘dislike’. Mwunun- 
gha ‘incompetent’ in particular, which is the most frequent exclusive secondary 
collocate in comments, occurs only once as secondary collocate in articles. More- 
over, in the case of comments, other adjectives that have a high frequency but are 
not in the top 20 include taptapha ‘frustrating’ (171 occurrences), yekkyepha ‘dis- 
gusting’ (134 occurrences), anilha ‘complacent’ (132 occurrences), and mengchengha 
‘stupid’ (132 occurrences), thus showing a strong tendency to more direct and ad- 
verse criticisms in comments as compared to articles. 


4.2.2 Payksin ‘vaccine’ 


As a primary collocate of K-pangyek, payksin ‘vaccine’ ranks 13% in the Comment cor- 
pus and 34" in the Article corpus. For the secondary collocates for the K-pangyek- 
payksin pair, we again focused on adjectives (VA) and listed the most frequent ones 
in Table 8. Just as in Table 7, frequencies are presented in descending order, and in 
both absolute and normalized (1 per 10,000 ecel) values. 

Similarly to the case of cengpwu and rather unsurprisingly, the exclusive sec- 
ondary collocates are dominated by positive evaluation in articles (e.g. thanthanha 
‘strong’, sinsokha ‘prompt’, kakkap ‘close’) and negative evaluation in comments 
(e.g. mwunungha ‘incompetent, nuc ‘late’, hansimha ‘pathetic’). There are also a few 
positive adjectives in comments, such as coh ‘good’, taytanha ‘impressive’, and 
calangsulep ‘proud’; however, they are often used ironically or sarcastically to 
mock K-quarantine or criticize the government’s policies, as shown in the examples 
below (the collocates are highlighted in bold characters). 
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Table 8: Top 20 secondary collocates/payksin ‘vaccine’ (NNG). 


Rank Articles Comments 
Morpheme Tag Abs. fr. Norm. Morpheme Tag Abs. fr. Norm. 
fr. fr. 
1 iss ‘be’ VA 213 109 eps ‘not be’ VA 5385 92 
2 eps “not be” VA 78 40 iss*'be' VA 3749 64 
3 kath ‘similar’ VA 31 16 kath “similar VA 1671 28 
4 kanung ‘possible’ VA 31 16 mwunungha VA 1160 20 
‘incompetent’ 
5 cwungyoha VA 29 15 kuleh ‘like that’ VA 1002 17 
‘important’ 
6 thanthanha VA 29 15 nuc ‘late’ VA 836 14 
“strong” 
7 sinsokha ‘prompt’ VA 18 9 coh ‘good’ VA 820 14 
8 khu ‘big’ VA 18 9 ileh ‘like this’ VA 700 12 
9 kakkap ‘close’ VA 17 9 manh ‘many’ VA 588 10 
10 phylyoha VA 16 8 ecceh ‘how’ VA 490 8 
‘necessary’ 
11 mwumoha VA 16 8 hansimha ‘pathetic’? VA 448 8 
‘reckless’ 
12 manh ‘many’ VA 15 8  cwungyoha VA 420 7 
“important” 
13 ppalu ‘fast’ VA 14 7 'taytanha VA 326 6 
“impressive” 
14 wuswuha VA 14 7 kanungha ‘possible’ VA 310 5 
‘outstanding’ 
15 kuleh ‘like that’ VA 13 7 ppalu ‘fast’ VA 304 5 
16 talu ‘different’ VA 13 7 himtul ‘strenuous’ VA 301 5 
17 thunthunha VA 12 6 calna VA 293 5 
‘robust’ ‘distinguished’ 
18 nop ‘high’ VA 11 6 khu ‘big’ VA 265 5 
19 saylop ‘novel’ VA 11 6 calangsulep ‘proud’ VA 257 4 
20 ancenha ‘safe’ VA 11 6 phylyoha VA 214 4 


‘necessary’ 
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(9) a. They made a big fuss about K-quarantine last year, but now - good job 
with the vaccine. 
b. Bragging sounds good ... the good stuff is all about K-quarantine ... 
but what's the point, vaccine inoculations haven't started yet 


(10) a. hey bragged about K-quarantine crowing that they will be first in the 
world at developing the vaccine, but are we even in the early stages of 
clinical trials? That's impressive. Is this media manipulation? 

b. What an impressive K-quarantine. Spending 120 billion won in promot- 
ing that K-quarantine. If they had lobbied for the vaccine with that 
money instead, we would be overflowing with vaccines already. Where 
on earth did that money disappear? Who the hell used it all? 


(11) a. Proud K-quarantine: O Pfizer vaccines secured. 
b. Even developing countries have bought Pfizer and Moderna vaccines... 
Pm so proud of you, damn K-quarantine. 


While coh ‘good’ did count a few positive evaluations, negative contexts, including 
sarcastic expressions (9.a: “good job”; 9.b: “sounds good”), accounted for most of the 
cases. In the same vein, more than 90% of the 326 examples containing taytanha 
‘impressive’ and the 257 examples containing calangsulep ‘proud’ in the comments 
are sarcasms, as can be seen in (10) and (11). Commenters make use of irony to criti- 
cize the government's emphasis on the K-quarantine success at the beginning of 
the pandemic, express their disbelief in the government, and convey their anger or 
frustration about the mismanagement of the vaccine supply and inoculations. 


4.2.3 N-gram analysis 


The usage examples of 5-grams including K-pangyek illustrate the patterns used in 
high frequency in articles and comments in a more direct manner. Table 9 presents 
the top 15 5-grams in descending order of high frequency in each sub-corpus.? 

In the case of articles, the n-grams are mostly extracted from public statements 
on COVID-19 by a small number of experts, such as spokespersons of the govern- 
ment, political parties, and quarantine authorities, and therefore, contain rather posi- 
tive evaluations on K-quarantine (n-grams 1, 2, 6, 7, 9, 13, and 14 in the “Articles” 
column). On the other hand, comments are produced by a wide breadth of users and 
exhibit the typical patterns produced by commenters speaking about K-quarantine. 


13 5-grams from identical comments posted by the same users on multiple newspapers and 5-grams 
that are actually parts of 6-grams and above were excluded from the list. 
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Table 9: Top 15 5-grams for K-pangyek ‘K-quarantine’. 


Rank Articles Comments 
n-grams freq. n-grams freq. 
1 seykyey-uy pyocwun-i toy ‘become a global 90 K-pangyek kath-un soli ha 387 
standard” ‘blabber about K-quarantine” 
2 seykyey-uy mopem-i toy ‘become a role 74 K-pangyek-ini mwe-niha‘isitK- 160 
model for the world’ quarantine or what’ 
3 K-pangyek-uy sengpay-lul kel ‘risk the 51 ha-nun kes-i eps ‘there is 142 
success or failure of K-quarantine’ nothing [somebody] doesn’t do’ 
4 ek wen-ulo tayphok nulli ‘drastically 39 K-pangyek calang-cil-man ha 128 
increase by [number] won’ ‘only brag about K-quarantine’ 
5 payksin coki hwakpo-ey silpayha “fail to 38 un ancwung-ey-to eps “not care 102 
secure vaccines early” at all’ 
6 sengkwa-ka kyengcey-lo ieci ‘achievements 37 K-pangyek-un eti-lo ka “what is 95 
are linked to the economy” going on with K-quarantine” 
7 tayhanminkwuk-uy cabwusim-i toy ‘be the 37 ha-lttay-pwuthe alapo “know 87 
pride of South Korea” from the moment when” 
8 kyenghem-ul seykyey-wa kongyuha *share 36 1-nyen-i ta toy ‘it’s been almost 87 
the experience with the world” ayear 
9 wuswuha-m-ul yesilhi poi ‘clearly show 35 masukhu cal ssu-ko tani ‘go 77 
excellence' around with the mask on' 
10 ka toy-l swu iss “could become” 34 so ilh-ko oyyangkan kochi “lock 73 
the stable after the horse has 
bolted' 
11 cwuchey-ka toy-e ilwu *achieve as an agent 33 an ha-ko mwe ha “what is 67 
[somebody] doing, instead of’ 
12 K-pangyek-uy kipan-i toy ‘become the 30 man ha-l cwul al 'only know how 62 
foundation of K-quarantine' to' 
13 wiki swunkan-ey tewuk kangha 'stronger in 29 pyengsang hwakpo-to an ha 53 
times of crisis” ‘don’t even secure sickbeds’ 
14 uy sengkong-ul tewuk tuntunha “[verb] more 28 hongpo-pi-lo-man thangcinha 53 
reliably the success of ‘squander only on promotional 
expenses” 
15 cinanhay K-pangyek cahwacachan-ey 28 ha-l mal-i-eps ‘be lost for words’ 52 


tochwiha “intoxicated with the self-praise of 
last year's K-quarantine” 
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Quite significantly, many of these patterns are composed of high frequency verbs 
such as ha ‘do’ (n-grams 1, 2, 3, 4, 11, 12, 13, and 15 in the ‘Comments’ column) and 
constructions with the adjectives iss ‘be’ and eps ‘not be’ (n-grams 3, 5, and 15), 
which cannot be grasped with the sole collocate analysis. A sample of corresponding 
n-gram concordance lines for articles and comments are shown in examples (12) and 
(13) respectively. 


(12) N-gram concordances in “Articles” 


(13) 


a. 


K-pangyekun seykyeyuy phyocwuni toyessko seykyeyeyse kacang ppalli 
kyengceylul hoypokhako isssupnita. “K-quarantine has become a global 
standard and is rebuilding the economy faster than anything else in the 
world” 

mwun taythonglyengto onul(28il) siceng yenseleyse “K-pangyekun cen 
seykyeyuy mopemi toymye, tayhanminkwukuy capwusimi toyessta.” 
lako phyengkahaysssupnita. “Today, the 28", President Moon has stated 
in a governmental address that “K-quarantine is a role model for the 
whole world and makes the pride of South Korea.”” 

ichelem nollawul cengtolo palcenhan poken uylyo cheykyeywa paio uyyakph- 
wum sayngsan nunglyeki K-pangyekuy kipani toyesssupnita. ‘Such re- 
markably advanced healthcare system and biopharmaceutical production 
capacities were the foundation of K-quarantine.’ 

K-pangyek yeysanul ico 8chenek wenulo tayphok nullyesssupnita. 
“K-quarantine budget has been drastically increased to 1.8 trillion won.’ 


N-gram concordances in “Comments” 


a. 


taymanun cikum kholona hwanca = O iketun. mwusun K-pangyek kathun 
soli hako issnya? “Right now there is O COVID case in Taiwan. What 
K-quarantine are you blabbering about?” 

ikesi palo mwunceyini calanghanun K-pangyekita. mwe hana ceytaylo 
hanun kesi epsta. ‘This is the K-quarantine Moon Jae-in is boasting 
about. There is not one thing he is doing right.” 

cengpwuka kwukminuy kosayngun ancwungeyto epsnun kes kathta. wuli- 
nalanun wancenhi papo hokwuta. cen seykyey kamyemcatuli mollyeokeys- 
sta. ikey K-pangyekilanta. “The government doesn't seem to care about 
its citizens” suffering. Our country is a complete laughing stock. People 
are coming from all over the world. This is what they call K-quarantine.’ 
ikesi K-pangyek!! omanpangcahal ttaypwuthe alapwassta 777 “This 
is K-quarantine!! I knew it from the moment he started to brag about it 
hahaha’ 

pyengsang hwakpoto anhay- payksin hwakpoto anhay~ yethay Inyenton- 
gan cincca han key mweeyyo?? cengpwunimtul?? ceyka pokieyn K-pangyek 
man calangkeli han ke malkon amwukesto mos han ke kathuntey... You 
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didn’t even secure sickbeds- You don't even secure vaccines~ what is it 
that you have done for the last one year? Government people?? All I can 
see is you bragging about K-quarantine but nothing else . . .' 


N-gram analysis shows the contextual meaning that cannot be analyzed in colloca- 
tion analysis by extracting patterns from entire sentence or comment as a whole, 
instead of keeping to the limited vocabulary surrounding K-Quarantine. It also con- 
firms the usage trend differences between articles and comments. 

L2R2 window primary collocates, secondary collocates, and n-gram analyses of 
K-pangyek demonstrated that despite the fact that the neologism was used with 
high frequency in both news articles and comments, usage patterns were rather dif- 
ferent, if not opposite to each other. Articles tended to praise K-quarantine or at 
least talk about it in neutral terms as they report on the official stance of the govern- 
ment. Comments, on the other hand, left room for personal opinions, criticisms and 
negative evaluations of the government's actions, often using irony. If comments 
cannot be said to represent the entirety of language perfectly, they do constitute 
‘real-life examples’ of subjective evaluation and creative language which cannot be 
shown in article texts. In that sense, it has become more than necessary to take 
them into consideration in neologism research and dictionary description. 


5 From experts’ language to people’s language: 
Suggestions for the lexicographic description 
of COVID-19 neologisms 


As for conclusions, this last section examines ways to incorporate the above analy- 
ses into the lexicographic description of COVID-19 neologisms. Research on neolo- 
gisms has mainly focused on the genre of ‘news articles’ so far. In the same vein, 
neologism headwords and usage examples provided by dictionaries for such head- 
words are often newspaper-based. As demonstrated in the previous sections, com- 
ments as language resources by non-expert writers provide a new, raw facet of 
language that reflects to great extent, at least in terms of neologisms, the creativity 
and diversity of language users. 

The utilization of user-generated content such as comments in neologism de- 
scription entails a change in the scope and the method of the lexicographic descrip- 
tion of neologisms. Here we discuss the case of COVID-19 neologisms in terms of 
both macrostructure and microstructure. 

First of all, it is rather evident that the macrostructure changes according to the 
language resources used to collect headwords. On the one hand, the example of 
names designating COVID-19 evidences the potential diversity of media and genres 
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in neology research, but on the other hand, it shows some limitations and raises 
issues to be addressed. For instance, the official name of the disease according to 
the South Korean government is kholonapailesukamyemcung-19 ‘coronavirus infec- 
tion 19. Figure 3 shows that only three other names are used more frequently in 
news articles. In comments, however, the official name is rarely used and instead, 
there are no less than ten other names which are used more frequently as shown in 
Figure 4. For the most part, appellations found in comments are strongly biased, 
thereby quasi-inexistent in articles with the exception of wuhanphyeylyem “Wuhan 
pneumonia’.'* 

Metcalf (2002) and Barnhart (2007) have proposed the diversity of genres as 
one of the determining criteria for the establishment of neologisms and their inclu- 
sion in the dictionary, along with frequency and time span. Thus, the use of com- 
ments as language resource for the extraction of neologisms to be included in 
Korean dictionaries would mean securing the diversity of the genre where neolo- 
gisms may potentially appear, but also redefining the other determining criteria, in- 
cluding frequency, which have been traditionally used in Korean neology and 
lexicography research (Nam 2015). Furthermore, as illustrated in (14), commenters” 
language, and broadly speaking social media language, more often than not con- 
tains socially problematic discriminatory and hate expressions, thereby requiring 
language experts” examination and discussion with regard to the representation of 
such deviant language. Regardless, the diversity and non-normative deviance of 
comments offer great prospects for neologism research. 

Korean neologism research has tended to focus on frequency when it comes to 
the establishment of new words, but comments often show a completely different 
tendency from articles. A case in point is the neologism wuhanphyeylyem “Wuhan 
pneumonia” (Lee/Kang/Nam 2020: 154). The neologism was used extensively at the 
beginning of the pandemic but appeared to fall out of usage as early as March 2020. 
Indeed, as shown in Figure 5, solely based on the frequency distribution of news 
articles, wuhanphyeylyem is more of a dying neologism; however, the frequency 
trends of wuhanphyeylyem in comments tell a different story, where the neologism 
is still in usage and has the potential to survive for many more years. 

Regarding the microstructure of COVID-19 neologism headwords, comments pro- 
vide examples of language spoken in real life in contrast with the traditional examples 
taken from articles, which do not necessarily provide typical usages of neologisms. In 
addition, the inclusion of socio-cultural characteristics, which can be based on com- 
ment data, and contextual and pragmatic information on the headword may be crucial 
depending on the type and purpose of the dictionary. The following two examples 
present a microstructural model for the two COVID-19 neologisms of wuhanphyeylyem 


14 In the beginning, newspapers also called the disease wuhanphyeylyem “Wuhan pneumonia” in 
the sense that it first appeared in Wuhan, China, and manifested as a pneumonia. 
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Kholona-19 ‘corona[virus]-19’ A 1810460 
sincongkholonapailesukamyemcung ‘novel coronavirus infection’ TS 271123 


wuhanphyeylyem ‘Wuhan pneumonia’ | 22073 


Kholonapailesukamyemcung-19 ‘coronavirus infection 19’ 3310 


Figure 3: Frequencies of COVID-19 appellations in articles (from January 2020 to March 2021). 


wuhanphyeylyem ‘Wuhan pneumonia’ AAA 95117 
Kholona-19 ‘corona[virus]-19’ DDT 32704 
wuhankholona ‘Wuhan corona([virus]" WS 16895 
taykwukholona ‘Daegu corona[virus] W 5603 
cwungkwukphyeylyem ‘China pneumonia’ HH 4950 
cwungkwukkholona “China corona[virus] W 4016 
taykwuphyeylyem ‘Daegu pneumonia’ Wi 2665 
ccangkkayphyeylyem ‘Chinky pneumonia’ § 1221 
sincongkholonapailesukamyemcung ‘novel coronavirus infection’ | 817 
ccangkkaykholona ‘Chinky corona[virus]’ | 365 
kholonapailesukamyemcung-19 ‘coronavirus infection 19’ | 57 


0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 
Figure 4: Frequencies of COVID-19 appellations in comments (from January 2020 to March 2021). 


‘Wuhan pneumonia’ and K-pangyek ‘K-quarantine’,” with metalinguistic information 
on the various components of the microstructure indicated in italics. 


(14) Example of microstructure for wuhanphyeylyem ‘Wuhan pneumonia’ 
wuhanphyeylyem ‘Wuhan pneumonia’ Korean headword (original Hanja 
(EN 2$) compound form) grammatical category 


Another term for COVID-19, which may definition and register information 
be considered offensive. 


The term was first used to refer to an in- socio-cultural context 
fectious respiratory disease, which was 

caused by a new coronavirus identified 

in Wuhan, China, in December 2019, 

and later led to the pandemic outbreak. 


In February 2020, the World Health Or- usage information 
ganization (WHO) announced that 

words indicating geographical locations 

should not be included in the name of 


15 The definitions are from Nam et al. (2021: 147-148), to which additional information and exam- 
ples extracted from the Comment corpus were added based on the results of the present study. 
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(15) 


the disease and officially named it 
“COVID-19” 


In the case of COVID-19, it used to be 
called Wuhan Pneumonia after the city 
of Wuhan, China, where the outbreak 
originated. 


Why on earth does this disease keep 
changing names so often? Should we 
call it Wuhan pneumonia? Novel coro- 
navirus? COVID-19? 


COVID-19, Wuhan corona 


example from an article 


example from a comment 


synonyms 


Example of microstructure for K-pangyek ‘K-quarantine’ 


kheyi-pangyek ‘K-quarantine’ (KB) 
compound 


[K: Korea] 


Term referring to the quarantine system 
implemented by the South Korean 
government to deal with infectious 
diseases and which began to be used 
with the global pandemic of COVID-19. 


K-quarantine has become a global 
standard and is rebuilding the economy 
faster than anything else in the world. 


K-quarantine budget has been 
drastically increased to 1.8 trillion won. 


The time of boasting about 
K-quarantine is long gone and now not 
a single vaccine for 600 infected people 
is a pathetic sight. 


After spending 120 billion won of 
taxpayers’ money to promote 
K-quarantine, the medical team is 
worn, sickbeds are scarce and there’s no 
vaccine prepared. 


Korean headword (original English 
and Hanja forms) grammatical 
category 


morphological and cultural 
information 
definition 


example from an article 


example from a comment 
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K-quarantine is pure propaganda for 
the general elections: in fact, the 
quarantine has failed. 


The value of comment data in lexicographic description ultimately lies in the pragmatic 
information and the socio-cultural background it provides on headwords and which are 
not easily seen in existing dictionaries. Moreover, unlike articles, comments are pro- 
duced by a multitude of commenters and reflect their emotions and stances in relation 
to the relevant neologisms, providing dictionary users and future generations with fresh, 
raw examples of real-life language for neologism headwords. Korean neology research 
has thus far focused on article texts, limiting the scope of information on neologisms. To 
cope with this shortcoming, it is necessary to examine the emergence of neologisms in 
comments and other genres so as to study and describe the various attributes of neolo- 
gisms. COVID-19 neologisms, in particular, have proliferated for the past year or so, to 
express, describe, and comment on a global phenomenon, constituting an unprece- 
dented case of profuse and multifaceted neological creativity centered on a single topic. 
This is precisely what this study sought to grasp by analyzing the differences in distribu- 
tion and trends of COVID-19 neologisms across the two genres of articles and comments. 
Ultimately, this paper reflected on ways to apply the fruits of such research in the practi- 
cal domain of lexicography and showed that the raw language of commenters, despite 
its many issues, has its place among language experts in dictionaries. 
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Pedro J. Bueno, Judit Freixa 

Lexicographic detection and representation 
of Spanish neologisms in the COVID-19 
pandemic 


1 Introduction 


The syntagma gel hidroalcohólico ‘hydroalcoholic gel’ or the noun hidroalcohol ‘hy- 
droalcohol’ cannot be found in Diccionario de la lengua española (DLE) of the Real 
Academia Española (‘Royal Spanish Academy’) or other general reference dictionar- 
ies of the Spanish language. This is so despite the fact that, for well over a year and 
to this very day, we have not been able to do anything without first sanitising our 
hands with this product. It is one of the many neologisms that the COVID-19 pan- 
demic has brought us, and these have become commonly used words that dictio- 
naries should consider as candidates for future updates. 

By looking at the dictionarisability of these neologisms, in this work we try to set 
their boundaries on the continuum along which they fall. “Dictionarisability” means, 
in our context, the greater or lesser interest of these unities regarding the updating of 
general language dictionaries. At both ends of this continuum, there are surprising 
nonce words, as well as neologisms that have recently lost their status as such be- 
cause they have now been incorporated into the dictionary. To identify different 
groups on the continuum of pandemic neologisms, we take into account the criteria 
proposed in the current literature and, by so doing, we are able to assess the extent 
to which they are discriminatory. This will allow us to address the neological process 
and to reflect on the various stages of it, from the time a neologism is born until the 
moment it ceases to be one because it has been dictionarised. 

Before that, however, we present the framework of our study and refer to the 
mechanisms available for detecting neologisms in general and pandemic neolo- 
gisms in particular. 


Note: This article was prepared with the help of the LEXICAL project “Neología y diccionario: análi- 
sis para la actualización lexicográfica del español” of the Ministry of Economy and Competitiveness 
(ref. PID2020-118954RB-100), funded by the State Research Agency (AEI) and the European Regional 
Development Fund (ERDF). 


Pedro J. Bueno, Universitat Pompeu Fabra, Barcelona, Spain, email: pedrojavier.bueno@upf.edu 
Judit Freixa, Universitat Pompeu Fabra, Barcelona, Spain, email: judit.freixa@upf.edu 
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2 Study framework 
2.1 Defining what we understand by pandemic neologism 


In most of the neology literature in Spanish, a neologism is considered to be a new 
word, either formally or semantically, or taken from another language.’ However, 
we will use the following definition of a neologism: “A recent word that is in the 
process of becoming established in a language”? (Freixa, in press), so not all recent 
words are neologisms unless first signs of their common use by speakers can be 
noted in corpus data. 

In this case, the words that we consider ‘recent’ are the following: a) all the 
terms related to the COVID-19 pandemic that first appeared between January 2020 
and June 2021, and b) those that had appeared earlier but have experienced a big 
increase in use during the pandemic. 


2.2 Detection of pandemic neologisms 


Novelty is the main characteristic of neologisms and, since novelty is a perceptively 
subjective quality, a methodological criterion must be established to obtain data 
objectively. This criterion will necessarily be separate from the theoretical under- 
standing of the concept of neologism. Moreover, it will always be an unsatisfactory 
one because we will be trying to square the circle. Assuming these limitations, the 
most reliable criterion for the detection of neologisms will be the comparison of 
analysis texts (necessarily current, these texts are the ones from which neologisms 
are expected to be extracted) with an exclusion corpus that must be capable of 
being deemed representative of the language. Ideally, this corpus should be a bal- 
anced body of texts in terms of discursive genres, themes and linguistic varieties, 
and it should include historical and current language. Thus, all the lexical units 
documented in the current texts that do not appear in the corpus deemed represen- 
tative of the language may be considered new. 

However, most neology observatories around the world do not have such an 
ideal corpus or the equipment to exploit it, so an exclusion corpus usually employs 
a lexicographic corpus composed of one or more dictionaries deemed representative 
of the language on which work is being done. In the case of Spanish neology 


1 The origin of this definition can be found in the early authors of French lexicology, who faced 
the challenge of defining such an undefinable concept, such as Matoré (1952), Guilbert (1975) and 
Rey (1976). 

2 Our definition is clearly inspired by Hohenhaus (2007: 18) who argues that neologisms are 
“words that are ‘young’, diachronically speaking, but which nevertheless have already entered the 
language as more or less institutionalised vocabulary items”. 
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observatories, this method is used, and every unit from the analysed text not found 
in the exclusion corpus formed by DLE or other general reference dictionaries of the 
Spanish language are regarded as neologisms. 

When the criterion for the detection of neologisms is determined in this way, it 
is called the lexicographic criterion (for the detection of neologisms). Criticism can 
easily be levelled at it (and it is widely criticised, indeed) because it does not dis- 
criminate neologisms from other words not found in the dictionaries for other reasons. 
As discussed in Bueno/Freixa (2020), by using the lexicographic criterion, what we ac- 
tually get are lexicographic neologisms, some of which are true neologisms while others 
are pseudoneologisms. The following are considered as pseudoneologisms: a) morpho- 
logically regular and semantically transparent non-new words, whose meanings can 
be deduced from words and/or elements already found in dictionaries (this is the rea- 
son why dictionaries are reluctant to accept them); b) specialised lexical units (terms) 
that are already in the corresponding terminology dictionaries, whose novelty is sim- 
ply the fact that they have entered general use; c) colloquialisms, non-recent units 
that dictionaries do not systematically include; d) old and new, general and special- 
ised, frequent and occasional loanwords that, due to language policy criteria, dictio- 
naries restrictively select for their lists of words; e) words bearing witness to an era 
and a place that are generally not likely to have a long course to run in society; f) local- 
isms and dialectalisms that, again, dictionaries do not systematically include because 
of their lack of general use; g) nonce words, which appear for reasons that are more 
expressive than denominative, have a strong playful component and not necessarily 
with the object that they become part of general language; and h) variants, errors and 
other non-new units that are not found in dictionaries for various reasons and, by ap- 
plying the lexicographic criterion, also become pseudoneologisms. 

However, neology observatories are led by linguists who are well aware of 
these shortcomings and therefore filter neologisms by the type of research that is 
intended to be carried out. To do this, all lexicographic neologisms are accompa- 
nied by different pieces of information relating to linguistics (type of neologism, 
grammatical category, etc.), use (type of text, context, linguistic markers, fre- 
quency, etc.) and documents (relationship to words already documented, presence 
in other dictionaries, etc.). 

Currently, the detection of neologisms is carried out using information technol- 
ogy tools designed for this purpose. In the case of the Barcelona Neology Observa- 
tory’ the tool is called Buscaneo,* which was developed by the group itself in 2004 
and is now used by all the Spanish neology observatories. Buscaneo scans the press 
and searches for all the words in the computerised dictionary. To those it cannot 


3 https://www.upf.edu/web/obneo (last access: 10 June 2022). 
4 http://obneo.iula.upf.edu/buscaneo/ (last access: 10 June 2022). 
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find, it applies filters to reject proper nouns and other uninteresting units. For the 
remaining ones, Buscaneo provides an interface allowing users to complete an 
entry form, adding data or information to fields that the program cannot automati- 
cally complete. 

Buscaneo (like other automatic neology detectors), which is currently used to 
extract words from different types of written text (newspapers, magazines, Twitter), 
makes the task of detecting and recording neologisms considerably less onerous 
and offers a high degree of reliability. However, it has two limitations that, to date, 
can only be overcome by performing an additional manual extraction: first, such 
programs cannot detect semantic or syntactic neologisms (because, formally, they 
are already in the dictionary) or compound units (because the search strategy is 
monolexical); and second, they are not yet ready to work with oral-based texts, 
which are crucial to the study of lexical innovation because they are texts with a 
more spontaneous style. 


2.3 The neological process 


Beyond the discussion about which words are neological and which are not, we be- 
lieve that, from a lexicographic perspective, it is more interesting to try to explain 
the neological process; a process that begins when a word is born and then be- 
comes a unit that is sufficiently well-established in social use to be included in a 
general dictionary (although such formalisation may not occur for reasons specific 
to a particular dictionary), because neologisms at a more advanced stage of the neo- 
logical process should be the first to be recorded in dictionaries. 

This dynamic and complex vision of a neologism is based on the debate initi- 
ated by Bauer (1983), with the distinction of three moments in the establishment of 
a new word: the first occurrence, called a nonce word, followed by institutionalisa- 
tion in use, and lastly by lexicalisation. That vision reached its culminating point 
with the work by Schmid (2008), who offered a much more comprehensive evolu- 
tionary process that split the evolution of a new word — from its first appearance to 
the end of its journey — into three stages, which he called creation, consolidation 
and establishment. At each stage, three processes take place simultaneously until 
the end of the road: firstly, at the structural level, lexicalisation occurs, which is the 
formal process from the creation of the word to its fixation; secondly, at the socio- 
pragmatic level, a neologism spreads among speakers and is potentially institution- 
alised; and thirdly, at the cognitive level, the concept is hypostatised, and speakers 
incorporate the lexicalised unit into their mental lexicon. 

Based on Schmid's (2008) approach and Kerremans” (2015) review, Freixa (in 
press) tries to identify different neological behaviours. Of course, a nonce word 
comes first because it is the one that starts the process off. If it stops at that first 
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occurrence, it will remain as such and not be a neologism proper, precisely because 
it meets just a momentary expressive need. 

Ephemeral neologisms come second. These are units that manage to acquire a 
certain frequency of use and, in accordance with Schmid (2008), also start the pro- 
cess at the cognitive and structural levels with hypostatisation of the concept and 
lexicalisation of the form. However, the process then stops because the neologism 
soon falls into disuse for some reason (but, ultimately, because the concept or form 
ceases to hold any interest for speakers). 

If they do not stop at nonce words and are not characterised as being ephem- 
eral, neologisms can follow the stabilisation process in different ways. Renouf 
(2013) referred to the evolution of neologisms as their life-cycle, based on the obser- 
vation of their frequency. She identified several stages: birth, increase in frequency 
and occurrence, establishment, death and revival (2013: 182): 


The diachronic approach to the study of neologisms in text allows us to observe the existence 
of a measurable ‘life-cycle’ for each word. According to this metaphor, used by analogy with a 
human life-span, the life-cycle of a word is conceived as consisting of some or all of the follow- 
ing major stages: birth, or perhaps just first occurrence in text; possible increase in frequency 
and occurrence; productivity, creativity, settling down, assimilation and establishment in the 
language, obsolescence, possible death — and possible revival. 


Similarly, in Freixa (in press) the histograms of a set of Spanish neologisms were 
studied and the following behaviours were identified: first, the ideal neologism, 
characterised by a sustained rise, which necessarily shows that the process has not 
concluded; second, the logical neologism, characterised by a rise and followed by 
stabilisation; and third, the realistic neologism, which rises, falls and then stabil- 
ises; and lastly, the variable neologism, which fluctuates between more or less pro- 
nounced rises and falls. 

In this paper, we intend to show how much progress the different Peninsular 
Spanish pandemic neologisms detected by the Barcelona Neology Observatory have 
made in the neological process, and whether the behaviours observed in Freixa (in 
press) can be confirmed. We will also offer some examples of the lexicographic re- 
presentation that some neologisms already dictionarised have received. 


3 Corpus and methodology 


The corpus neologisms that we analyse were obtained by manual and automatic ex- 
traction from oral texts (radio) and written texts (high circulation newspapers, mag- 
azines and Twitter accounts) using the lexicographic criterion mentioned above. 
The corpus comprises 209 COVID-19-related neologisms that either appeared 
for the first time in 2020 and the first half of 2021, or had appeared earlier but expe- 
rienced a striking increase over this period. The data were extracted from the 
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BOBNEO database,” but data relating to frequency were supplemented by consult- 
ing Factiva, the world’s biggest press database. In the corpus we observed how the 
frequency of some words was negligible or even non-existent till the beginning of 
the pandemic as in the case of nueva normalidad ‘new normality’ which numbered 
910 occurrences in the year 2015, as a non-lexicalised placement, reaching 162,843 
in 2020. We also noticed the extraordinary rise of covid and coronavirus, making up 
to more than two and a half million occurrences in just a year, and the emergence 
of some words exclusively related to the pandemic, such as anticovid, not present- 
ing a real evolution and starting to be used in 2020 with a high frequency. 

Based on these results, for the analysis we divided the neologisms into different 
groups, which form a continuum, by taking into account their frequency over the 
past twenty years (the chart shows the last three years only). We obtained the six 
groups in Table 1, following a progression in base 10. The table also shows the fre- 
quency results from the BOBNEO neologism database over the past thirty years to 
supplement the previous ones. As can be seen, the neologisms are fairly evenly dis- 
tributed except in groups 4 and 5, where a greater concentration of cases occurs. 


Table 1: Pandemic neologisms in frequency groups. 


Group F Factiva FOBNEO Number 
1 0-9 1 21 
2 10-99 1-4 26 
3 100-999 1-9 27 
4 1,000-9,999 1-20 67 
5 10,000-99,999 1-50 40 
6 +100,000 +50 28 

209 


For the analysis, information on the horizontal axis of Factiva (age) was also taken 
into account, and neologisms were labelled according to whether they were first 
documented in 2020 or whether they already existed, in which case, their distribu- 
tion was calculated over the years. 

As we can see in the last row of Table 2, the neologisms that appeared in 2020 
represent one third of the total, but the table shows how they are distributed ac- 
cording to their frequency of appearance: in the more frequent groups of neolo- 
gisms, the percentage of new ones is 14-15%, whereas in the less frequent groups 
of neologisms, the percentage of new ones is higher than 80%. This correlation be- 
tween age and frequency is quite logical. 


5 http://obneo.iula.upf.edu/bobneo/index.php (last access: 10 June 2022). 
6 https://global.factiva.com (last access: 10 June 2022). 
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Table 2: Age of pandemic neologisms by frequency group. 


Group Earlier New Number 
1 1 20 21 
2 6 20 26 
3 20 7 27 
4 53 14 67 
5 34 6 40 
6 24 4 28 

138 71 209 


4 Analysis 


For the analysis, we took our corpus of pandemic neologisms, organised into differ- 
ent groups by their frequency, and assumed that the more frequent they were, the 
more dictionarisable they would be. But, based on the most recent literature on up- 
dating of dictionaries (Metcalf 2002, Ishikawa 2006, O’Donovan/O’Neill 2008, Cook 
2010, Adelstein/Freixa 2013, Freixa 2016, Nam et al. 2016, Freixa/Torner 2020, 
Klosa-Kückelhaus/Wolfer 2020, Bernal et al. 2020, among others), we also assumed 
that neologisms would have greater or lesser lexicographic interest depending on 
how long they had been in use (age), their denominative or stylistic function, their 
formation mechanism, and other aspects such as record of use. 

To observe the extent to which trends in the units” dictionarisation and forma- 
tion mechanism exist, we take into account the results shown in Table 3, where it is 
possible to see how the neologisms in each frequency group are distributed by the 
type of neologism in question. We do not, of course, intend to draw conclusions 
from a corpus of 200 examples and subgroups of such low numbers, but we do 
want to comment on the trends observed. 

Little can be said about the first five types, since almost no examples were 
found, but Table 3 shows trends that are taken into account in the analysis, such as 
the concentration of neologisms formed by blending and neoclassical compounding 
in the groups where frequency is lower, the concentration of syntagmatic neolo- 
gisms in the groups where frequency is higher, or the concentration of prefixed neo- 
logisms in the intermediate group. 

In the analysis discussed below, we have put the six groups into three blocks 
due to the small corpus of examples. As we shall see, these three blocks have internal 
consistency: we can consider those in frequency groups 1-2 as non-dictionarisable 
neologisms, those found in groups 3-4 as neologisms in the antechamber of dictionar- 
isation and, lastly, those in groups 5-6, where frequency is higher, as dictionarisable 
neologisms. 
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Table 3: Distribution by types of pandemic neologism, by frequency group. 


Type 1 2 3 4 5 6 Total 
abbreviation 1 1 
conversion 1 1 
initialism 1 1 2 
borrowing from English 1 1 2 4 
semantic change 3 1 1 5 
compounding 1 3 4 1 1 10 
blending 5 7 1 2 1 16 
suffixation 2 3 3 6 3 17 
neoclassical compounding 9 5 7 9 5 4 39 
prefixation 2 3 9 19 7 7 47 
syntagmatic compounding 2 4 7 24 20 10 67 


TOTAL 21 26 27 67 40 28 209 


4.1 Non-dictionarisable neologisms 


In the main, the metaphor of war has been used to frame the discourse around the 
crisis caused by the COVID-19 pandemic. Most world leaders have done so, al- 
though some sectors, especially healthcare, have pointed out that this should not 
have been the mindset conveyed to the population. But it has been, and continues 
to be, because it has been observed that the general public reacts obediently to this 
approach (Sabucedo et al. 2020). 

There are, however, different ways of dealing with a crisis, both socially and 
individually, and words that are heavily loaded with humour or criticism have also 
appeared in the vocabulary generated by the pandemic. Thus, rather than meeting 
a denominative need, some of the pandemic neologisms fulfil an expressive one 
that sometimes seeks to find the funny side of the situation to make it more bear- 
able. These are nonce words. 

In our corpus, nonce words account for almost a quarter of the total number of 
neologisms (47 out of 209). We found pure nonce words (group 1, 21 examples), i.e., 
those that have a really low frequency. But, by extending the concept of nonce 
word, we have also considered disseminated nonce words (group 2, 26 examples), 
i.e., those spread via social media, with a little higher frequency, although they are 
still occasional lexical events in the language. 
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More than half of these examples are formed by neoclassical compounding, a 
mechanism whose playfulness lies precisely in the seemingly serious and specialised 
result it yields (teletrabajopatia ‘compulsive teleworking’, metacrisis, boeólogo -ga 
“boeologist”), or by blending, a word formation mechanism to which the literature 
has attributed a transgressive character (Hohenhaus 2007, Renner 2015, Winter- 
Froemel/Zirker 2015). In this case, the most recursive blending occurs with the corona 
element (coronapincho ‘coronaspike’, coronahambre ‘coronahunger’, coronamiedo 
‘coronafear’). Therefore, some authors refuse to consider them as neologisms (Gérard 
2018, Klosa-Kiickelhaus/Wolfer 2020, Bueno/Freixa 2020) while not seeking to take 
away their value; indeed, the study of these units allows us to find out about speak- 
ers’ resources and dynamics in terms of linguistic creativity. 


4.2 Neologisms in the antechamber of dictionarisation 


The block of pandemic neologisms that falls in the central or mean frequency space 
is the most numerous one and comprises 27 group 3 neologisms (up to 1,000 occur- 
rences in Factiva) and 67 group 4 ones (up to 10,000 occurrences in Factiva). These 
are, therefore, neologisms that have clearly begun the neological process, but, as 
we shall see in the analysis, have not yet completed it. 

Social institutionalisation is certainly underway, but most have not been around 
long enough, as only a quarter of these neologisms had been documented previously. 
In this case, they are non-neological units in specialised use, and the novelty lies in 
their spread to general use: azitromicina ‘azithromycin’ has been documented since 
1997 and has a total of 3,670 occurrences, of which 2,451 were observed in 2020 (in 
previous years, there were no more than 240 a year); in a lower frequency range, 
apoyo respiratorio ‘breath support’ has a total of 651 occurrences since it was first 
documented in 1995, of which 440 were observed in 2020 (in previous years, there 
were no more than 34 a year). Other units like these are test serológico ‘serological 
test’, pluripatologia ‘multipathology’, presintomático -ca ‘presymptomatic’, etc. These 
units will most likely not complete institutionalisation in general use, and will return 
to specialised use, although this will depend on what happens with the pandemic we 
are still experiencing. 

The abandonment of the neological process that some units have initiated will 
also depend on how the pandemic develops: ephemeral neologisms are units that 
disappear from use when they are dependent on a passing social phenomenon (be 
it a technological discovery, a health crisis or perhaps something related to the 
fashion world). Covidiota ‘covidiot’, balconero -ra “”balconer”’, telecolegio “tele- 
school’, coronabono ‘coronabond’, grupo burbuja ‘bubble group”, to mention a few, 
may disappear from use before they become stable. But we must bear in mind that 
a Characteristic feature of ephemeral neologisms is that their birth may occur more 
than once, i.e., a neologism that did not become institutionalised may have new 
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opportunities. Coronavirus, for example, has been sporadically documented in high 
circulation newspapers for more than 20 years, but it had an opportunity to become 
institutionalised in 2003, when the number of occurrences reached more than 
1,000 due to the severe acute respiratory syndrome coronavirus (SARS-CoV) epi- 
demic in Southeast Asia. However, the word's appearance became residual in just 
two years. A new attempt to become institutionalised occurred in 2015, with the 
Middle East respiratory syndrome coronavirus (MERS-CoV). Although its high fre- 
quency peak lasted only one year, coronavirus remained in use with about 100 oc- 
currences per year until 2020, when it finally became institutionalised. 

According to Schmid (2008), in the establishment of a word or what we call the 
neological process, besides institutionalisation in use, lexicalisation” occurs at the 
structural level and hypostatisation takes place at the conceptual level. Lexicalisa- 
tion is a process of linguistic fixation of a new word's formal and semantic aspects, 
and thus it acquires a more precise meaning and a less variable form. This process, 
which is initiated with the first occurrences of a neologism, does not appear to have 
been completed in some of the examples making up the block of neologisms under 
analysis. For example, in Table 4, we can see that the neologism distancia social 
“social distance” coexists alongside a diverse range of forms that show different de- 
grees of social institutionalisation. These variants display the most defining seman- 
tic features of the concept, and together show that there has not yet been any 
formal fixation that, to some extent, lexicalisation entails (although the number of 
occurrences does inform us of the preferred variants in use). 


Table 4: The neologism distancia social “social distance” and its variants. 


neologism occurrences 
in Factiva 
distanciamiento social “social distancing” 114,386 
distancia de seguridad ‘security distance” 101,773 
distancia social ‘social distance” 68,134 
distanciamiento físico “physical distancing” 19,154 
distancia interpersonal (interpersonal distance” 17,084 
distancia física ‘physical distance” 17,018 


distancia sanitaria 'sanitary distance” 375 


And lastly, the concepts denominated by these neologisms cannot be deemed hypo- 
statised by the majority of speakers. When a speaker is faced with a new word, he or 
she analyses its morphological constituents. The more transparent and less ambigu- 
ous the morphological structure of the word is, the faster the process of understanding 


7 See Lipka et al. (2004) for a review of concepts of the concepts of institutionalisation and lexicali- 
sation. 
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it will be. And, depending on its level of semantic transparency, the formation of the 
new concept will be easier or harder. Such semantic transparency is determined by 
the frequency of the constituents, the number of existing lexemes with those constitu- 
ents, and the semantic relationship between them. In addition, the information pro- 
vided by the co-text and the context influences the development of the new concept 
(Schmid 2008). Some of the neologisms in this block are at an advanced stage of hypo- 
statisation (mascarilla higiénica ‘hygienic mask’, posconfinamiento ‘posconfinement’, 
antimascarillas “antimasks”), but others are not for a variety of reasons, such as the 
fact that they are highly specialised units (gerontofobia “gerontophobia”, sobreinfección 
‘overinfection’, dexametasona ‘dexamethasone’). 

We have therefore said that the neologisms in this block (frequency groups 3 
and 4) are in the antechamber of dictionarisation because it is not yet time for them 
to enter it. The lexicographic interest that these units hold will depend on the 
course they take over the coming years, which in turn will depend on the evolution 
of the COVID-19 pandemic. Some of them, i.e., those bordering on the block of more 
frequent neologisms, are more institutionalised in use, are more lexicalised units, 
and a higher number of speakers has already hypostatised the concept, but the neo- 
logical process has not yet been completed. 

Those neologisms that succeed in completing this process will then face selection 
by a dictionary, in line with its internal criteria. In relation to DLE, Bernal et al. (2020) 
have noted that the internal balance of the dictionary ultimately determines the deci- 
sion-making. So, for example, in the dictionary update, those neologisms forming a 
derivative series are good candidates. But, of course, the series cannot be unlimited: 
the words infección “infection”, infectar “to infect’ and infeccioso -sa “infectious” are al- 
ready in DLE. In pandemic use, however, the derivatives sobreinfección “overinfec- 
tion”, reinfecciön reinfection’, reinfectar ‘to reinfect’ and reinfectado -da ‘reinfected’ 
are recurrent and, since all of them are predictable derivatives, the dictionary may not 
consider them necessary (Bernal, 2021). The same applies to the pandemia “pan- 
demic’ family (postpandémico -ca “postpandemic”, prepandémico -ca ‘prepandemic’, 
antipandémico -ca ‘antipandemic’ and its variants, and the cuarentena ‘quarantine’ 
family (precuarentena “prequarantine”, postcuarentena “postquarantine”, semicuaren- 
tena “semiquarantine”), among others. 

These neologisms are not usually included in general dictionaries and, at most, 
can be found in dictionaries of neologisms, especially, in those produced in digital 
format. This is the case with Antenario,* a dictionary of neologisms monthly updated 
by the neology groups in the net of Antenas Neolögicas,? with unities from the differ- 
ent geolectal varieties of Spanish. In Antenario, more than 50 neologisms have 


8 Antenario: https://antenario.wordpress.com (last access: 10 June 2022). 
9 Antenas Neológicas: https: //www.upf.edu/web/antenas (last access: 10 June 2022). 
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already been published under the thematic label of Pandemia Covid-19 ‘COVID-19 
pandemic'. One of them is shown in Figure 1: 


reinfectarse v intr 


Año de la primera documentación: 2020 


Definición Volver a infectarse con el virus o la bacteria que produce una 
enfermedad después de haberse recuperado de ella. 


Contextos «Este caso muestra que es posible reinfectarse solo unos meses 
después de haberse curado de una primera infección», indicó en un 
comunicado el departamento de microbiologia de la Universidad de 
Hong Kong (HKU). [E/ Universal (México), 24/08/2020] 


Ha advertido hoy de que han detectado que hay personas que han 
pasado la infección que no desarrollan la inmunidad y pueden 
reinfectarse. [El Periódico (España), 7/05/2020] 


Nodos ESP 
Diccionarios Alvarí + Alvar2 * Clave « DAMER * DEA + DNEA - DUE4 - LAR 


+ info Martes Neológico - Neologismo del mes 


Figure 1: Example of pandemic neologism published in Antenario. 


In Antenario, the choice made is one of building a blog-format dictionary with 
thematic, linguistic and pragmatic tags users can send their comments to. As 
seen in Figure 1, neologisms are accompanied by the usual information in the micro- 
structure of a dictionary (lemma, grammatical category, definition and examples) and 
complementary information related to geolectal information as well as to the neologic- 
ity of the word (age and dictionaries in which they are already documented). 


4.3 Dictionarisable neologisms 


The 68 most frequent and, in principle, more dictionarisable neologisms can be found 
in this block. They are more dictionarisable because they are the most institutional- 
ised ones in use and probably the most lexicalised and hypostatised ones too, because 
lexicalisation and hypostatisation come from use. This block, which includes 40 neo- 
logisms with a frequency between 10,000 and 99,999 occurrences and 28 neologisms 
with a frequency of at least 100,000 occurrences, also contains the highest percentage 
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of pre-existing neologisms (85.3% had already been documented prior to the pan- 
demic). It is therefore a set of neologisms that meet two of the criteria that are often 
mentioned in the literature for the purpose of assessing their dictionarisation (Metcalf 
2002, Ishikawa 2006, O’Donovan/O’Neill 2008, Cook 2010, Adelstein/Freixa 2013, 
Freixa 2016, Freixa/Torner 2020). Also mentioned in it are other criteria relating to 
use, which the pandemic neologisms in this group also fulfil, such as currency (they 
are current neologisms, although all the pandemic neologisms meet this criterion) 
and textual spread (they are used in texts of different types). 

As for linguistic criteria, all the neologisms fulfil the criterion of correct forma- 
tion and semantic necessity because, although most have predictable and composi- 
tional meaning (semipresencial ‘semipresential’, gel hidroalcohölico ‘hydroalcoholic 
gel”, supercontagiador ‘superinfecter’), the speaker does not know its precise mean- 
ing. In fact, the most lexicalised syntagmatic neologisms are concentrated in this 
block; they are clearly denominative and, in this case, widespread in use: crisis san- 
itaria ‘sanitary crisis”, presión hospitalaria ‘hospital preassure”, servicio esencial ‘es- 
sential services”, among others. While general dictionaries have tended to restrict 
the incorporation of polylexematic units, DLE has gradually become more open to 
units like these, which become subentries of existing words. 

The neologisms in this block also meet documentary criteria because most of 
them are listed in pandemic-themed dictionaries that have recently appeared, such 
as the Diccionario de covid-19 (EN-ES)!º by the International Association of Medical 
Translators and Writers and Related Sciences (Tremédica). Thus, they are neolo- 
gisms that have completed the neological process and, in fact, some of them have 
recently been incorporated into DLE, as we shall see. Close contact and social bub- 
ble are two of the pandemic neologisms already collected in the terminological dic- 
tionary published by Tremédica, as seen in Figure 2. 

As can be seen, the lexicographic representation is different in this case, as the 
most important information for translators has been prioritised, precisely because 
Tremédica is an international association of medicine and related sciences transla- 
tors. This way, as well as the equivalents in English, we can also consider the syno- 
nyms in both languages. 


5 Already dictionarised pandemic neologisms 


Fourteen of the neologisms in our corpus have already ceased to be neologisms ac- 
cording to the lexicographic criterion because they have recently been incorporated 
into DLE. These words are shown in Table 5, and reference is made to the frequency 


10 https://www.tremedica.org/tremediteca/glosarios/diccionario-de-covid-19-en-es/ (last access: 
10 June 2022). 
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close contact 


contacto estrecho; contacto cercano 
Sinonimia (en): close-distance contact; high-risk contact. 
Sinonimia (es): contacto de alto riesgo. 
Concepto: se refiere a estar dentro de un radio de 2 metros de la persona contagiada por tiempo 


prolongado y sin equipo de protección. Incluye a las personas que habitan bajo el mismo techo (— 
household contact). 

Ejemplo: According to the Center for Disease Control and Prevention, close contact is defined as: a) Being 
within approximately 6 feet (2 meters) of a COVID-19 case for a prolonged period of time; close contact 


social bubble 


burbuja social 
Sinonimia (en): bubble group; corona bubble; COVID bubble; COVID-19 bubble; germ bubble; quaranteam. 
Sinonimia (es): burbuja; grupo burbuja; grupo de convivencia estable [GCE]. 
Concepto: fórmula de aislamiento social intermedia entre relacionarse con todo el mundo y no hacerlo con 
nadie. Consiste en establecer un grupo cerrado de pequeño tamaño (por lo general, seis personas) con el 
que, sin ser convivientes, se mantiene contacto de manera recurrente. 


Figure 2: Examples of pandemic neologisms collected by Tremédica. 


group from our analysis for the purpose of seeing whether the dictionarised neolo- 
gisms matched the more dictionarisable ones: 


Table 5: Pandemic neologisms incorporated into RAE dictionary. 


Covid 6a medicalizar “to medicalize” 4 


coronavirico -ca ‘coronaviral’ 4 pandémico -ca ‘pandemic’ 5 
Coronavirus 6 positivo ‘positive’ 5 
cuarentenear ‘to quarantine’ 2a telemedicina ‘telemedicine’ 5 
desconfinamento ‘de-confinement’ 6 teletrabajador ‘to telework” 4 
desconfinar ‘to de-confine’ 4a teletrabajo ‘teleworking’ 6 
desescalada ‘de-escalation’ 6 videollamada ‘videocall’ 6 


Indeed, most of the incorporated neologisms are in the higher frequency range 
(groups 5 and 6) although, as we can see, some of them are in the middle range 
(groups 3 and 4) and one is in the lower frequency range (groups 1 and 2). We will 
first focus our attention on the latter, the neologisms which, because of their fre- 
quency, were not the best candidates for updating the dictionary. The first, and most 
exceptional one is cuarentenear ‘to quarantine’, a verb that occurs just three times in 
BOBNEO and 91 times in FACTIVA, so it seems to be a nonce word that has spread to 
some extent. Given that the verb cuarentenar ‘to quarantine’ already exists in DLE, 
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the introduction of the verb ending in -ear might be linked to a willingness to provide 
better representation of non-peninsular varieties of Spanish, since cuarentenear “to 
quarantine” has mostly been documented in Latin American countries. 

The adjective coronavírico -ca ‘coronaviral’, the noun teletrabajador -ra “tele- 
worker’ and the verbs medicalizar “to medicalize’ and desconfinar ‘to de-confine’ have, 
in our opinion, been rightly dictionarised for the reasons set out below. These, as Ber- 
nal et al. (2020) have already stated, are associated with DLE's internal criteria. On the 
one hand, they all have a relatively high frequency (more than 1,000 occurrences in 
2020) and, on the other, they all complete a derivative series of other words that were 
already present or have been recently incorporated into the dictionary: teletrabajador 
-ra ‘teleworker’ (lower frequency) is consistent with the incorporation of teletrabajo 
‘teleworking’ (but clearly inconsistent with the absence of the verb teletrabajar ‘to tele- 
work”), and coronavírico -ca ‘coronaviral’ is relevant since coronavirus and certain de- 
rivatives thereof have also been incorporated. In some cases, the neologisms also 
meet the criterion of age: teletrabajador -ra “teleworker has been documented since 
1995 and medicalizar ‘to medicalize’ since 1999, and the cruciality (Sheidlower 1995) of 
both is evident, since they are not products of a passing fad. All of them have a clear 
denominative function, had already been documented in specialised dictionaries, and 
refer to terms about which users may have some doubts regarding meaning or use 
(thus, for example, DLE gives two meanings for medicalizar: “dotar a algo, como un 
medio de transporte, de lo necesario para ofrecer asistencia médica” [to give some- 
thing, such as a means of transport, what is needed to offer medical care] and “dar 
carácter médico a algo” [to give something a medical character]. Lastly, we should 
add that there is no characteristic in their formation that would render them unsuit- 
able candidates for updating DLE. 

The verb desconfinar “to de-confine’ deserves special attention. We would argue 
that its incorporation is justified in accordance with most of the criteria set out above, 
such as the completion of a derivative series: confinar ‘to confine’ and confinamiento 
‘confinement’ were already in the dictionary, so the incorporation of reversible forms 
(desconfinar ‘to de-confine’ and desconfinamiento ‘de-confinement’) is as logical as the 
incorporation of other members of the same family having a similar frequency of use 
and cruciality would be, but which have nevertheless been left out: preconfinamiento 
‘preconfinement’, posconfinamiento ‘postconfinement’, reconfinamiento ‘reconfinment’ 
and autoconfinamiento ‘autoconfinement’. However, as already mentioned in previous 
paragraphs, the criterion of completion of a derivative series is limited by the criterion 
of formal and semantic predictability, which is used to reject units. 

The other neologisms incorporated into DLE (Table 5) are in the higher frequency 
groups in the consulted corpora; some appeared in 2020 while others had occur- 
rences in previous years, yet the cruciality of all of them has been evident during the 
pandemic. In descending order, with four million occurrences in Factiva, is covid 
(slightly more than coronavirus) and, with much lower frequencies but still in the 
highest frequency group, are desescalada ‘de-escalation’ (263,000) and teletrabajo 
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‘teleworking’ (156,000). The other dictionarised neologisms from the frequency group 
ranging from 10,000 and 99,000 occurrences are desconfinamento ‘de-confinement’ 
(56,000), pandémico -ca ‘pandemic’ (52,424), videollamada “videocall (36,800) and 
telemedicina ‘telemedicine’ (31,392). 

In Figure 3 we can see three of the already collected pandemic neologisms in DLE: 


coronavirus 


Del ingl. coronavirus, de corona 'corona solar', por el aspecto del virus al microscopio, y este del lat. 
coróna ‘corona’, y virus 'virus', y este del lat. virus 'veneno', 'ponzofia'. 


1. m. Med. Virus que produce diversas enfermedades respiratorias en los seres humanos, desde 
el catarro a la neumonía o la COVID. 


cuarentenear 


1. intr. Pasar un período de cuarentena (|| aislamiento preventivo por razones sanitarias). Es más 
llevadero cuarentenear con alguien. 


2. tr. p. us. Poner algo o a alguien en cuarentena (|| aislamiento preventivo por razones 
sanitarias). Tendremos que cuarentenear el ganado. Las autoridades cuarentenearon el crucero. 


desconfinamiento 


1. m. Levantamiento de las medidas impuestas en un confinamiento. 


Figure 3: Three pandemic neologisms collected in DLE. 


Figure 3 also shows how the lexicographic representation fits this kind of dictio- 
nary, in this case, a general Spanish language dictionary, also being an academic 
dictionary. This way, for coronavirus, a neologism that speakers could consider se- 
mantically unclear, the dictionary provides information about its origin and its 
usage in the medical area. For cuarentenar “to quarantine’ or desconfinamiento ‘de- 
confinement’, words formed following the word formation rules in Spanish, this in- 
formation about origins is not provided but linguistic and usage information are. 
DLE’s rapid incorporation of these words is certainly positive. They meet vari- 
ous dictionarisation criteria and their frequency is high. However, in line with these 
criteria, many others may get the opportunity to be accepted into the dictionary in 
future updates: examples such as gel hidroalcohólico ‘hydroalcoholic gel”, ensayo 
clínico ‘clinical trial”, distancia social “social distance’ (or distanciamiento social ‘so- 
cial distancing’, or variants deemed preferential, precisely pointing to usage) are 
units that are clearly denominative, even in the form of subentries, because of their 
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syntagmatic nature. Equally necessary are other words formed by compounding, 
such as infectólogo -ga ‘infectologist’, sociosanitario -ria ‘sociosanitary’, semipresen- 
cial ‘semipresential’; by blending, such as conspiranoico -ca; or by initialism, such 
as EPI ‘PPE’ and ERTE ‘furlough’. Likewise, the fact that other high frequency neo- 
logisms have been left out is understandable because they are descriptive syntag- 
mas, such as those that have ‘crisis’ as their base: crisis del covid ‘covid crisis’, 
crisis sanitaria ‘sanitary crisis’, crisis social ‘social crisis’, or those with different 
families of derivatives, especially with pre-, post- and anti- attached to covid, coro- 
navirus, pandemia ‘pandemic’ and other pandemic-related terms. 

That said, they are neologisms that have become stable in use, and their incor- 
poration into the dictionary will depend on the criteria that the dictionary applies 
to the units, not as neologisms but as language units. According to Torner (in 
press), a study for the lexicographic sanctioning of neology should consider this 
dual dimension and observe neological forms from this two-fold perspective. The 
dictionarisability of neology is a dual property acting on a two-fold plane: that of 
consolidation in use on the one hand, and that of the criteria governing the elabora- 
tion of dictionaries on the other (Torner in press). 


6 Conclusions 


In her magnificent work published in 2015, Kerremans compared neologisms to 
casting show winners: some become stable or consolidated as singers, others get to 
have a hit, yet most fall into oblivion. The television industry provides a context 
within which they can gain huge popularity within a very short space of time, but 
as the focus of the industry’s interest shifts, the artists’ popularity may quickly fall. 
Some manage to keep going for a while, while others manage to break into the in- 
dustry without even winning the contest, so there does not appear to be a recipe for 
guaranteed success (Kerremans 2015: 15). 

Indeed, the most dictionarisable neologisms are those with certain characteris- 
tics, yet reality has shown us time and again that many of the neologisms that fulfil 
those seemingly essential characteristics may not become stable, while others that 
do not fulfil them may. 

The pandemic has mobilised vocabulary in an unprecedented way, as noted by 
Pons (2020) and, just 20 days into the first lockdown, words that had not previously 
existed began to appear, words that had not been used for a long time were revived 
(lexical resurrection, according to Pons, such as the verb desescalar ‘de-escalate’), 
or a new sense or a more specific meaning was given to words already in use. 

We do not know how many neologisms have been created since the start of the 
pandemic, but there are undoubtedly many more than the 209 analysed in this work, 
based on the Neology Observatory’s extraction of neologisms from oral and written 
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texts. Such extraction has been performed annually since 1989. It provides a snap- 
shot of how the lexicon of the language has developed to adapt to the changes in 
society. However, that extraction is not systematic, and although the most frequent 
neologisms have been detected because of their recurrent appearance in the press, 
many of the more fleeting ones have not. Had they been detected, the latter would 
have considerably enlarged our corpus of pandemic neologisms. Nonetheless, with 
the corpus available to us, we have been able to see that new words did appear, 
others were reborn, and some of the already existing ones have taken a new path. 

Looking at the corpus from a lexicographic perspective, we divided this new 
vocabulary into three blocks. In the first, we found good examples of speakers” cre- 
ativity in terms of meeting their more expressive and less denominative needs with 
nonce words, which performed their function yet held no lexicographic interest. In 
the second, we analysed a set of neologisms midway along the neological process, 
which could not be deemed stable in use and, therefore, were in the antechamber 
of dictionarisation; the path that these might ultimately take is unknown. And, in 
the last block, we observed those neologisms that had already completed their jour- 
ney; some have already been lexicographically sanctioned, and others may be in 
due course. 
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Andreína Adelstein, Victoria de los Ángeles Boschiroli 
Spanish neologisms during the COVID-19 
pandemic: Changing criteria for their 
inclusion and representation in dictionaries 


1 Introduction 


The COVID-19 pandemic is a global event in a globalized society, and in many ways 
unprecedented. One of them is that, by August 2021, it is still an ongoing phenome- 
non, thus any analysis or description is provisional and/or contingent. Another fea- 
ture is the immediate, urgent, and changing nature of events. Nevertheless, scientific 
research in different areas has had to speed up its processes in order to achieve re- 
sults that have social impact; among these, those about linguistic description and 
lexicographic records. 

The urgent need to account for this extraordinary reality as expressed in lan- 
guage, especially in lexical creativity, can be observed in the updates of language 
dictionaries in 2020 and the choice of words of the year, as well as the proliferation 
of an unusually large amount of individual or institutional inventories (for Spanish, 
for example, COVIDCIONARIO, Barale 2020, Lungevity Foundation 2021); stories in 
mainstream press, ephemeral publications and postings in social media where anal- 
ysis and reflections are outlined with varying degrees of expertise. This was also the 
case in academic works describing such issues as productive resources or relation- 
ships between different languages, among others, which have multiplied since the 
end of 2020 and throughout 2021 (see Zholoboba 2021, Baharati 2020, Klekot 2021, 
Haddad/Moreno Martínez 2020, Mweri 2021, Carpintero/Tapia Kwiecien 2020). 

In this context, where establishing a corpus of analysis can be a particularly 
difficult task — due to the seemingly unstoppable surge of new words that have 
been appearing in parallel to the different phases of the pandemic, scientific advan- 
ces and social reactions to government health policies, and the global nature of the 
creative phenomenon - it is worth studying if criteria traditionally applied to in- 
clude and treat neologisms in different types of dictionaries have changed in any 
way (see Barnhart 1985, Bernal/Freixa/Torner 2020, Cook 2010, Ishikawa 2006, 
Klosa-Kückelhaus/Wolfer 2020, O’Donovan/O’Neill 2008). 
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The aim of this work is to describe criteria used in the process of inclusion and 
treatment of neologisms in dictionaries of Spanish within the framework of pan- 
demic instability. Our starting point will be data obtained by the Antenas Neológicas 
Network! (https://www.upf.edu/web/antenas), whose representation in three different 
lexicographic tools will be analyzed with the purpose of identifying problems in the 
methodology used to dictionarize — that is, how and what words were selected to be 
included in dictionaries and how they were represented in their entries — neologisms 
during the COVID-19 pandemic (sources and corpora of analysis, selection criteria, 
types of definition, among other aspects). Two of them are monolingual and COVID-19 
lexical units were included as part of their updates: the Antenario, a dictionary of neo- 
logisms of Spanish varieties, and the Diccionario de la Lengua Española [DLE], a dic- 
tionary of general Spanish, published by the Real Academia Española [RAE], Spanish 
Royal Academy). The other is a bilingual unidirectional English-Spanish dictionary 
first published as a glossary, Diccionario de COVID-19 EN-ES [TREMEDICA], entirely 
made up of neological and non-neological lexical units related to the virus and the 
pandemic. Thus, the target lexis was either included in existing works or makes up the 
whole of a new tool located in a portal together with other lexicographic tools. Unlike 
other collections of COVID-19 vocabulary that kept cropping up as the pandemic un- 
folded, all three have been designed and written according to well-established lexico- 
graphic practices. 

Our working hypothesis is that the need to record and define words which were 
recently created impacts the criteria for inclusion and treatment of neologisms in 
dictionaries about Spanish, including a certain degree of overlap of some features 
which are traditionally thought to be specific to each type of dictionary. 

To this end, we will start by describing some of the most salient characteristics 
of the lexis of the COVID-19 pandemic in Spanish. Then, we will analyze the three 
lexicographic works. We will look at their headword selection procedures and how 
words are treated, in particular, with regard to what definition resources they de- 
ploy and how variation is recorded. Finally, we will discuss our conclusions about 
the peculiarities of the methodology found to be used in the inclusion and treat- 
ment of neologisms related to the pandemic. 


1 The Antenas Neológicas Network, created in 2003, is one of the networks associated with the Ob- 
servatori de Neologia of the Institut Universitari de Lingüistica Aplicada, Pompeu Fabra University, 
whose purpose is to collect neology in order to describe the varieties of some Latin American coun- 
tries, in addition to that of Spain. The European node is the Observatorio de Neología of the Univer- 
sidad Pompeu Fabra, which registers neologisms of newspapers published in Barcelona but that 
have national circulation. The Latin American nodes are research teams from: Universidad Nacio- 
nal de General Sarmiento (Argentina), Universidad de Concepción and Pontificia Universidad Catól- 
ica de Valparaíso (Chile), Colegio de México (Mexico), Universidad Autónoma de Manizales 
(Colombia) and Universidad Femenina del Sagrado Corazón (Peru). 
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2 Neology of the COVID-19 pandemic: Main 
characteristics and impact on lexicography 


The pandemic has impacted exponentially the neology of national languages in 
every field of human activity. Many of the neologisms are, in fact, internationalisms 
(coronavirus, COVID-19), which can be considered, to a large extent, an extreme 
case of what have been called global linguistic variants (Sayers 2014, Buchstaller 
2008, apud Sayers 2014), that is, linguistic innovations that emerge simultaneously 
in very distant places, such as, for example, semantic neologisms like aislamiento, 
confinamiento, cuarentena (all three referring to ‘lockdown’) in different varieties of 
Spanish, or microgota (Spanish), microdrápe (Norwegian), microgoccia (Italian) (all 
of them equivalents of ‘microdroplet’). From a lexicographic perspective, these 
global variants are likely to be included in different types of dictionaries covering 
the phenomenon, given the frequent use the media have made of them. 

As a matter of fact, the lexis of COVID-19 in Spanish, as has been the case in 
other languages, includes lexical units that have a different diachronic status: 
words with a relatively low frequency of use that have been revitalized, which were 
already found in general language dictionaries (barbijo ‘face mask’) or that had not 
been included before the pandemic (coronavirus); non-neological terminological 
units that became frequent in everyday discourse (carga viral ‘viral load’, oximetro 
‘pulse oximeter’) and terminological neologisms that are rapidly used in the press 
(supercontagiador ‘super-spreader’); denominative and/or stylistic neologisms from 
different fields and styles (zoompleaños ‘Zoom birthday party’, covidiota ‘covidiot’); 
potential words or occasionalisms, of little (coronabicho ‘coronabug’) or no use 
(coronahijo ‘coronachild’). 

How these words are recorded and treated lexicographically depends, of course, 
on the type of dictionary: language dictionaries will include items of almost all of 
these kinds (except for, perhaps, occasionalisms); language dictionaries of neolo- 
gisms will also add those with a certain degree of diffusion; non-institutional or occa- 
sional glossaries (some of which claim to be “dictionaries” despite not following 
rigorous lexicographic practices) include mostly stylistic neologisms, ephemeral neo- 
logisms or occasionalisms. For example, the COVIDCIONARIO has, among others, co- 
ronabirra ‘cocktail party during lockdown’, coronamiento ‘corona lie’; the Diccionario 
Latinoamericano de la lengua española features coronabobo ‘coronamoron’, corona- 
mor ‘coronalove’, coronanoico ‘corona paranoid’, covicheado ‘COVID infected'.? These 
informal records have had an unusual role in Spanish lexicography, which is dis- 
cussed below. 


2 The phenomenon described in ten Hacken/Koliopoulou (2020: 129) seems to have multiplied and 
is repeated throughout the globe: “New words are always marked. This is illustrated by the publica- 
tion of lists and discussions of words in newspapers, which are attested in many languages”. 
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A feature of particular relevance for lexicography is neological productivity, in 
terms of the productivity of neological processes (formal, semantic or loans), mor- 
phological productivity (productivity of affixes) or productivity of results (frequency 
of tokens). A quick look at the more than 300 neologisms recorded in 2020 by the 
Antenas Neológicas Network shows that the most productive processes have been 
syntagmatic compounds (cuarentena intermitente “intermittent lockdown’, barbijo 
social ‘non-medical face mask”), prefixation (postpandemia ‘post pandemic’, pre- 
cuarentena ‘pre lockdown’), suffixation (hidroalcohölico ‘alcohol gel’ adj., sanitizar 
“to sanitize’), acronymy (covidivorcio ‘COVID divorce’, zoompleaños ‘Zoom birthday 
party’) and loanwords (coronacrash, zoomer). However, neologisms such as corona- 
crisis and coronabullying, which made their way into Spanish soon after they were 
coined in English, may be perceived as originally Spanish acronyms rather than as 
loanwords. In some cases, it can be hard to decide whether a new word is a calque 
or an item formed in accordance with the morphological rules of Spanish. This is 
the case of supercontagiador ‘super-spreader’ and microgota ‘microdroplet,’ which 
defy easy classifications, as calques from English or derived words. On the other 
hand, as regards regional variation, the use of lexical variants which belong to a 
certain national variety by the press from a different region or country (which often 
happens when international news stories are translated and reproduced) tends to 
reinforce pan-Hispanic practices, despite (and to the detriment of) the pluricentric 
character of the language. Thus, depending on the country, different names have 
been adopted or are preferred to designate social isolation measures: confinamiento 
‘confinement’ in Spain, or aislamiento ‘isolation’ and cuarentena ‘quarantine’ in Ar- 
gentina and, to a lesser extent, Chile, Mexico, and Peru. 

This raises the following questions: In what type of dictionary and to what ex- 
tent should variants of syntagmatic compounds, such as inmunidad comunitaria, in- 
munidad de rebaño, inmunidad colectiva, inmunidad de grupo (‘herd immunity’), be 
treated? What about those that make up a derivational paradigm, such as barbijo 
social ‘non-medical mask”, barbijo quirúrgico ‘surgical mask’, barbijo casero ‘DIY 
mask’? How are neologisms that were created to address a phenomenon which is 
specific to this pandemic, but that may have a more general reference, such as re- 
confinamiento ‘new lockdown’ or desconfinamiento ‘lifting of lockdown’, treated? 

Summing up, so far the features of Spanish neology about the pandemic that 
may have a bearing on the criteria for lexicographic treatment have been found to 
be: (i) global variants and influence of calques in speakers’ perceptions (which may 
be perceived to be formed according to the rules of their own language, Klekot 
2021); (ii) variants of the different varieties of Spanish that end up being used in 
others (e.g. desconfinamiento (‘lifting of lockdown’), originally coined in Spain and 
later used elsewhere) (iii) a high degree of terminological banalization in different 
everyday fields, (iv) high degree of denominative, but also expressive, neology — 
linked to the ephemeral or occasional use of the word, (v) high productivity of 
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acronymy, especially with corona-, COVID-, cuarent-,? linked to stylistic neology, 
hence, occasional or ephemeral (and, as a result, unlikely to be included in general 
language dictionaries), as Navarro (2020) points out, (iv) changes in the way words 
circulate: words which have not been used much still get attention and diffusion 
through non-institutional lexicographic records (e.g. covidiota ‘covidiot’). 


3 Theoretical framework 
3.1 Neologisms and dictionaries 


Neologisms are usually defined as new words; their novelty may lie in different as- 
pects of the lexical item: morphosyntactic, such as aplausazo ‘communal clapping’; 
semantic, such as confinamiento ‘lockdown’; linked to loanwords, such as pandemial 
‘born during the pandemic’. Their neological nature may be determined through dif- 
ferent parameters, which have been the object of many studies, especially in the 
Romance languages tradition (Barnhart 1985, Boulanger 1979, Cabré 2002, 2016, 
Cook2010, Guilbert 1975). Among these, the most widely cited criteria are the chrono- 
logical (when they were first coined or recorded), psycholinguistic (speakers’ percep- 
tion of novelty), lexicographic (their inclusion in dictionaries) and formal instability 
(variation in their written or spoken renderings). 

Schmid’s definition (2008: 1) foregrounds an aspect of neologisms of particular 
interest when considering their inclusion in dictionaries, their “in-process” status, 
that justifies the claim that not every neological item can or should be included in 
dictionaries (i.e., not just in general language dictionaries, but also dictionaries of 
neologisms): 


Neologisms are not simply ‘new words’. Rather, at least in theoretical terms, they are words 
which have lost their status as nonce-formations and are in the process of becoming or already 
have become part of the norm of the language |. . .], but are still considered new by most 
members of a speech community (Fisher 1998, 3; Hohenhaus 2005, 365). This of course implies 
that a word may be a neologism for one language user and familiar to another, and that in the 


absence of clues provided by the speaker signalling the newness of the word . . . hearers will 
be unsure whether either they are confronted with a new word or an existing word unfamiliar 
to them. 


In connection to this, Adelstein/Boschiroli (2020: 296) discuss the paradoxical na- 
ture of neologisms as lexical units and how it affects lexicographic typology, which 
can be summed up as follows: (i) a neologism is not a full-fledged word, but must 
have the necessary conditions to become one in the future, (ii) the paradox also 


3 They could even be thought of as affixes, as some authors have suggested about -gate or -landia 
(‘-land’), which the DLE describes as a compositional element. 
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manifests in the fact that a neologism may be the creation of an individual speaker, 
but it is only through its use by a speech community that it acquires its neological 
status, and (iii) in the case of pluricentric languages such as Spanish, a lexical unit 
may cease to be a neologism in one country but still be one in others. 

Furthermore, distinctions have been drawn between types of neologisms based 
on the extension of their social use, as well as between different types of neology: 
Guilbert's classical distinction between discourse and language neology (1975), Ca- 
bre's opposition between ephemeral and lasting neology (1989), the distinction be- 
tween neologism and occasionalism proposed by Dressler (1993), apud Mattiello 
(2016: 115). These distinctions tend to suggest that only those neologisms that 
spread beyond the personal or occasional sphere of an individual speaker should 
be included in general language dictionaries. 

Dictionaries of neologisms are characterized in dictionary typologies as re- 
stricted, mostly on account of chronological considerations.” They are language dic- 
tionaries that have a two-way relationship with general language dictionaries, 
which play a crucial role when determining the neological nature of a word. On the 
one hand, general dictionaries are used as reference points: a unit will be consid- 
ered neological if it is not found in a lexicographic exclusion corpus (that is, the set 
of dictionaries used to corroborate whether the item is documented). On the other, 
once the headword list of a dictionary of neologisms has been drawn, inclusion in 
the general dictionary is still a central goal: the neologisms chosen for a dictionary 
of neologisms are likely to be included eventually in a general language dictionary. 
In other words, first, the general dictionary is an instrument that legitimizes the 
neologicity of words that will be included in the dictionary of neologisms, and sec- 
ondly, dictionaries of neologisms are instruments that can be used to update gen- 
eral dictionaries. 

Adelstein/Boschiroli (2020) identify three characteristics of dictionaries of neo- 
logisms. They are “transition devices”, since some of the lexical units they collect 
hold a special status: from a chronological point of view they are likely to be leav- 
ing their continuity stage and entering their final stage in their condition of neolo- 
gism, in terms of Anula Rebollo (2010); ‘remedial devices’, since they include words 
that may not be neological from a chronological or psycholinguistic point of view, 
but which are neological from a lexicographic perspective; and “documents”, since 
they include words that may prove to be ephemeral and thus may never reach the 
status of institutionalized words. We will come back to these properties and review 
them after our analysis, to establish whether they are exclusive of dictionaries of 
neologisms in Spanish as regards COVID-19 vocabulary. 


4 In this work we do not consider the multiple online collections recording ludic or occasional cre- 
ations, most of which do not follow lexicographic criteria nor base their contents on accurate lin- 
guistic descriptions of the units. 
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3.2 Criteria for inclusion of neologisms in dictionaries 


The process of including new words in a dictionary has usually been discussed al- 
most exclusively in terms of the updating of general language dictionaries (see e.g. 
Barnhart 1985, Ishikawa 2006, O’Donovan/O’Neill 2008). Among the most cited cri- 
teria, we can identify stabilization (as opposed to the ephemeral character of neo- 
logisms), frequency of use (as opposed to hapaxes), dispersion of occurrence (as 
opposed to high frequency in a limited range of textual types) and, on the other 
hand, the witness nature of new words (Matoré 1953) and the need for naming that 
drives the creation of new words. Calculations to articulate criteria have also been 
proposed, e.g., Barnhardt (2007), Metcalf (2002), and Cook (2010). 

As regards Spanish, on the premise that frequency of use is an a priori criterion 
for inclusion in dictionaries, Adelstein/Freixa (2013) study how neology observato- 
ries can contribute to the process of lexicographic update, concluding that a suit- 
able proposal should take account of the different dimensions of lexis and combine 
formal (variants of forms previously included in dictionaries, formation rules, re- 
strictions of the base and other elements), semantic (degrees of polysemy, polysemy 
production) and sociolinguistic (stability of use, extension of use, and naming 
needs), besides lexicographic, criteria. However, the chronological criterion is not 
made explicit; it is subsumed in the sociolinguist criterion of stability. 

Freixa/Torner (2020) analyze dictionarization of neology in Spanish by carrying 
out a comparative study of data in connection to changes of frequency of neologisms 
throughout time and speakers” perceptions about their novelty. Adelstein/Boschiroli 
(2020) discuss criteria for inclusion of neologisms in neology specific dictionaries 
from a pluricentric, non-panhispanic perspective of Spanish. 

Within Spanish lexicography, the issue of how the RAE includes new words in 
the DLE (often referred to as “words accepted by the RAE” by Spanish speakers at 
large) has been the focus of Bernal/Freixa/Torner (2020). They analyze criteria im- 
plicit in the inclusion of words in the DLE by focusing on neologisms with a high 
degree of frequency. Frequency of use is found to be necessary but not sufficient: 
other factors related to the internal coherence of the dictionary, such as completing 
derivative series and lexical sets (especially specialized lexis), representing geolec- 
tal variants and orienting normative use often take precedence. Words created in 
accordance with Spanish word formation rules are favoured over borrowings (see 
also Klosa-Kiickelhaus/Wolfer 2020: 151). Another important factor is the inclusion 
of words that were created to satisfy naming needs, such as words related to new 
technologies or realities. Both internal coherence and naming needs seem to have 
been central in the 2020 update which includes words related to the COVID-19 pan- 
demic, as will be discussed below. 
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4 Methodology 


Our starting point in the lexicographic analysis of criteria for inclusion and micro- 
structural treatment of neologisms is a list of 321 neological items detected and re- 
corded during 2020 and 2021 by the Antenas Neológicas Network.” These data are 
collected exclusively from the written press of the six countries that make up the 
network; this may be regarded as a limitation in terms of diaphasic variation in re- 
lation to pandemic vocabulary, but on the other hand, it guarantees a certain de- 
gree of institutionalization, which is an essential aspect when considering the 
inclusion of new words in a general language dictionary. 

The following information about the number of recorded occurrences, disper- 
sion of use in all the countries and formation processes has been found to be rele- 
vant when analyzing criteria for inclusion in dictionaries: 

— Total amount: 321 

— Number of hapaxes: 96 

— Number of items which were recorded in all six countries: 26 

— Number of neologisms which were recorded or are being compiled in the dictio- 
naries studied: 13 (DLE), 63 (Antenario), 87 (TREMEDICA). 


In order to verify which items were exclusive of the pandemic - i.e., whose referents 
belong to the pandemic and are not revitalized forms or lexicographic neologisms 
from previous years - the following sources were checked: Corpus del español NOW 
[NOW] by Mark Davies (2012-2019)° and Corpus del español del siglo XXI [CORPES], 
updated in 2021, which is 40% press texts. This information should condition repre- 
sentation in the microstructure. For instance, coronavirus was first recorded in the 


5 This network follows the same methodology and uses the same limited-access online platform to 
enter the relevant information about the neologisms detected from the main newspapers of the 
countries of the network (data about grammar, sources and type of neological formation) as the 
rest of the observatories and networks related to the Observatorio de Neología of the Universidad 
Pompeu Fabra (cf. https://www.upf.edu/web/antenas/metodologia). The results are later published 
in the open-access lexical database BOBNEO (http://obneo.iula.upf.edu/bobneo/index.php). A lexi- 
cographic criterion is applied for identification: the items recorded have not been included in the 
dictionaries that make up the exclusion corpus for each country or region, while every node checks 
the words against DLE and LEMA (https://www.upf.edu/web/antenas/corpus-lexicografico-de- 
exclusion). 

6 Corpus NOW has about 7.2 billion words of data from web-based newspapers and magazines 
from 2012-2019. 


Spanish neologisms during the COVID-19 pandemic — 101 


CORPES in 2006,” which means it would only be neological in the SARS-Cov-2 
meaning.? Its high frequency of use during the pandemic calls for the lexico- 
graphic inclusion of this originally specialized item in all its senses. The words 
documented in NOW belong to texts collected before 2019, therefore words iden- 
tified as pandemic vocabulary should have a non-exclusive treatment: some ex- 
amples are aerosolización ‘aerosolization’, aerosolizar ‘to aerosolize”, aislamiento 
sanitario ‘sanitary isolation’, aislamiento social ‘shielding’, alcohol en gel ‘alco- 
hol-based gel”. 

With the aim of determining if the users’ perspective (i.e. the needs of general 
users) was one of the criterions when considering the inclusion of new items, we 
focused on the number of searches of individual items made by users of the DLE 
between August 2020 and August 2021, as recorded in the “Registro de consultas 
al diccionario de la lengua española” (/https://enclave.rae.es/herramientas/regis 
tro-de-consultas-al-diccionario-de-la-lengua-espanola-dle). These searches can 
also be considered an index of the degree of institutionalization of the items in 
the framework of the pandemic. 


5 Analysis 


The analysis of the lexicographic representation of COVID-19 neologisms, whose ob- 
jects have changed in terms of their properties and the methodology used to study 
them, can be approached from two perspectives: (i) the neological processes them- 
selves and how they are recorded and (ii) how neologisms have been dictionarized. 
In this section we will focus on the latter. 


7 Coronavirus 
Absolute frequency: 1.380 Documents: 726 Normalized frequency: 4,12 cases per million) 


Period Freq Fnorm. 


2016-2020 1.300 30,99 
2011-2015 38 0,46 
2001-2005 34 0,33 


2006-2010 8 0,07 


8 These, however, are limited results: none of the following are recorded: acuarentenamiento, aero- 
solizacion, aerosolizar, antiCOVID, anticuarentena, antipandemia. There are 5 cases of cuarentenar, 
2 of which are wrongly labelled as verbs, 3 of antivacunas, just 1 from 2020. 
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5.1 Bilingual unidirectional dictionary: TREMEDICA 


The Diccionario de COVID-19 (EN-ES) [TREMEDICA] is an online one-way bilingual 
dictionary first published in May 2020 as a glossary (Glosario de COVID-19 EN-ES) in 
the webpage of TREMEDICA (https://www.tremedica.org/), an international organi- 
zation that groups together translators and writers specialising in medicine and 
health care. Its 2.01 (June 2021) version has 6,153 headwords. The reason to publish 
a glossary barely two months after the pandemic had been declared was, according 
to the authors, to record “[not only] the spontaneous creation of many neologisms 
in social media, but also the widespread use of a large amount of technicisms in 
texts of all kinds” (Saladrigas et al. 2020: 110-111). Although bilingual, it is the larg- 
est systematic lexicographic collection of COVID-19 vocabulary in Spanish, which, 
as will be discussed below, has had a probably unintended impact monolingual 
lexicography. 

TREMEDICA collects “basic terminology around COVID-19 in English” covering 
different aspects related to the pandemic, including lexis created and popularized in 
social media, to provide Spanish equivalents. This means that, on the one hand, nei- 
ther the English headwords nor their translations are always neological, and on the 
other, part of the equivalences are proposals which, as is often the case in bilingual 
dictionaries, do not claim to have been attested in use. Although this - i.e. the fact 
that the dictionary does not necessarily reflect actual language use in Spanish — may 
be seen as a shortcoming from a linguistic point of view, the dictionary clearly serves 
and has served an extremely useful practical purpose for its intended users — trans- 
lators, interpreters, journalists and other writers, especially science writers — since, 
given its breadth and depth of coverage, as well as the lack of other reliable lexico- 
graphic works around the subject (Navarro 2020: 790), it is a crucial reference tool 
that contributes to organizing and guiding lexical choices in a situation when this is 
highly needed. Unlike the other general monolingual language dictionaries explored, 
the dictionary has a functional, user-oriented focus (Tarp 2008: 47): it is mostly 
aimed at production and translation from English by professionals belonging to a 
specialized field. Therefore, although normative issues are addressed, especially 
through usage notes, communicative considerations seem to take precedence. This 
affects both macro and microstructural decisions. 


5.1.1 Neologisms in the Macrostructure of TREMEDICA 


The headword list cannot be accessed in full from the homepage, but a sample of 
810 entries — which gives a good insight into the longer list — is available as an “[a] 
bridged glossary of COVID-19 terms (en-es)” (Saladrigas et al. 2020). This covers the 
lexis of “the molecular biology of coronaviruses, clinical features of COVID-19, coro- 
navirus detection tests, diagnostic imaging tests, protective equipment, and the 
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COVID-19 vaccines being developed, as well as unusual neologisms, with particular 

emphasis on terms that are difficult to translate” (Saladrigas et al. 2020: 111). There 

is no explicit explanation regarding the sources where the English headwords or 

Spanish equivalents have been extracted from, nor any specification as to criteria 

for lemma selection, other than relevance to the target user. It may be assumed the 

source texts and corpora are listed under “Bibliografía” (“Bibliography”), though it 
is not clear how they are used other than the entries where examples are provided 

(see section 5.1.2. below). Most of the headwords and their equivalents are, in fact, 

either terminological (e.g. alveolar exudate “exudado alveolar”) or related to health 

care (death toll número de muertos”), not neological for the field, and unlikely to 
have been collected from non-specialised texts; these are clearly addressed mainly 
to the kind of professionals identified in the front matter; hence we will not focus 
on them here. However, there is a large group of headwords which were either 
coined (e.g. corona bonds ‘coronabonos’) or popularized (e.g. anti-vaxxer “antivacu- 
nas”), during the pandemic and have relevance beyond the medical fields. Since 

TREMEDICA is unidirectional, strictly speaking there is no Spanish headword list; 

however, both English headwords and Spanish equivalents can be accessed from 

the same search box and are given the same label, “término” (“term”). As will be 
seen when the microstructure is discussed, both English and Spanish units are ana- 
lyzed and explained in the entries. 

Both in the English headword list and in the Spanish equivalents there seem to 
be few restrictions as to the type and form of neological lexical unit presented: 

- in terms of length, there are single and multiword units, both in English and 
Spanish (see examples above) 

- in terms of formation, there are words derived by affixation, composition, and 
acronyms (infector ‘contagiador’, infoveillance ‘infovigilancia’, superspreader 
‘supercontagiador’) 

— in terms of type, there are examples of different kinds of borrowings: semantic 
loans (contact, ‘contacto’), calques (plandemic ‘plandemia’) and adapted loan- 
words (anti-mask ‘antimascarilla’) (Marello 2020: 170). 


This wide range of forms and types seems consistent with a production-focused ap- 
proach, characteristic of bilingual dictionaries in general and specialized dictionar- 
ies in particular, which tend to pay more attention to user communicative needs 
than monolingual general language dictionaries. There is one big exception, how- 
ever: in accordance with RAE normative recommendations, direct loanwords used 
in texts in Spanish tend to be avoided, even in cases where the borrowed variant is 
definitely more frequent than the calque (‘fake news”, ‘homeschooling’); so are cal- 
ques which have been discouraged by RAE itself (see discussion on sanitizar, ‘sani- 
tize’ below). However, these represent a small percentage of words associated with 
the pandemic. Overall, there is a conception of lexical unit that takes account of the 
role of multiword units in the lexis. 
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As pointed out earlier, equivalences are often translation solutions proposed by 
the dictionary, anticipating probable needs of professional users, rather than actual 
uses. This happens frequently when the headword is a multiword unit, where, as is 
common practice in bilingual lexicography, paraphrases are given, especially when 
no equivalents are available. This is, for example, the case of corona-shame, trans- 
lated with the near equivalent definitional paraphrase “recriminar (una conducta 
que podría favorecer el contagio del coronavirus” “to reproach (someone for a be- 
havior that could contribute to spreading coronavirus). 

In some cases, equivalents coined in accordance with word formation rules of 
Spanish that result in calques from English are given, although, as is the case with 
corona-snitch, these have not been found to be used in Spanish texts (see Figure 1)? 


corona-snitch 

coronachivato; coronasoplón; coronabuchón”" 

Sinonimia (es): covichivato. 

Concepto: persona que avisa a la policía para denunciar las covidioteces del 
coronaburro de turno (— covidiot). 


Ejemplo: Corona snitches: Police create web portal to inform on neighbours 
(Zindulka, 2020) <Coronachivatos: la policía ha creado un ciberportal para 
delatar a los vecinos>. 


Tremédica-Cosnautas 2020 O 


Figure 1: Corona-snitch entry (TREMEDICA). 


As is to be expected, given the circumstances under which the dictionary was com- 
piled, some of the proposals were either not taken up, or not used in all geolectal 
varieties (when diatopic variation is not signalled, the equivalence may be assumed 
to work for all varieties, which is not always the case) and other emerged which 
seem to have become more widespread. This is the case of fever clinic, for example: 
the equivalent given is “puesto de detección (temprana) (del coronavirus)” “(coro- 
navirus) (early) detection center”, perhaps an early paraphrase solution. However, 
the equivalent “unidad febril”, used in Argentina, is not provided, probably because 
it had not been coined yet when the entry was first published. 


9 “Concept: person who warns the police about the covidiocies of the latest coronamoron (see cov- 
idiot)” (our translation). 
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Some items which are not presented as equivalents, but as synonyms, may be 
regarded as exchangeable by the user, as will be discussed in the following section. 
Overall, it is clear that a user-oriented approach overrides other considerations and 
allows for the inclusion of neologisms with different degrees of institutionalization. 


5.1.2 Neologisms in the microstructure of TREMEDICA 


For every headword in English (in blue italics), TREMEDICA offers one or more 
equivalents in Spanish (in black bold type, see Figures 1-3); this is the only piece of 
information which appears in every entry in the dictionary. Often, besides the head- 
word and the equivalent, English — “Sinonimia (en)” — and/or Spanish — “Sinonimia 
(es)” — synonyms are provided; these, as in the example of essential workers, can 
also work as equivalents of the lemma (see Figure 2). 


essential workers 


personal esencial; trabajadores esenciales (o trabajadoras 
esenciales) 


Sinonimia (en): key workers. 

Sinonimia (es): trabajadores (o trabajadoras) de actividades esenciales. 
Ejemplo: Another concern, depending on how the spread of the virus evolves 
could be high rates of absentees among health care workers and other 
essential workers (Washburn, 2020) <Otro motivo de preocupación, según 
cómo evolucione la propagación del virus, podrían ser las elevadas tasas de 
ausentismo de los profesionales de la salud y otros trabajadores esenciales>. 


Tremédica-Cosnautas 2020 O 


Figure 2: Essential-workers entry (TREMEDICA). 


Thus, for every headword, an entry may suggest several equivalents, which, as sug- 
gested earlier, provides the user with different alternatives. Sometimes these are 
geolectal variants, as bulodemia (marked ES because it is only used in Spain) in in- 
fodemic'? (see Figure 3): 


10 “NOTE (Spanish) epidemic (or pandemic) of disinformation which results from a combination 
of information overload, compulsive consumption of information and proliferation of fake news in 
highly alarming global situations, such as the COVID-19 pandemic. It is a colloquial word: Fundeu 
approves of the calque, but some object to it arguing information cannot be regarded as bad in 
itself.” 
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Other kinds of variants, such as stylistic variants, are often included in two 
other optional fields in the microstructure: “Concepto” (Concept, see Figure 1) or 
“Nota” (’Note’) (see Figure 3). In the case of neologisms, these fields are alterna- 
tively used to: 

— give a definition of the word, including when it has acquired a specific meaning 
during the pandemic (e.g. in de-escalation, cabin fever) 

— classify the word (e.g. as neologism in plandemic) 

- explain the origin and situation in which a word is used (in curbside, mask- 
hole), or even the referent (in elbow bump) 

- explain the use of the word (in precoronavirus, post coronavirus) 

- provide normative recommendations (in herd immunity) 

- provide grammatical information (in antivaxxer) 

- inform about synonyms (in anti-mask) 

—  cross-refer to other entries for further information (in mask) 

- provide stylistic variants (in corona, new normal, pandemic generation) 


infodemic 


infodemia 
Sinonimia (es): desinfodemia; bulodemia®®. 
Nota (es): epidemia (o pandemia) de desinformación desencadenada por la 
mezcla de — information overload, consumo compulsivo de información y 
proliferación de — fake news en situaciones de gran alarma mundial, como la 
pandemia de covid-19. Es término propio del registro coloquial; la Fundéu da 
por bueno el calco, pero otros lo critican por considerar que la información no 
puede considerarse algo malo en sí. 


Tremédica-Cosnautas 2020 O 


Figure 3: Infodemic entry (TREMEDICA). 


These kinds of explanations, often extralinguistic, may apply both to the headword 
and the equivalent and are particularly interesting in terms of how neologisms are 
represented, because they show the instability and newness of the words, and the 
additional difficulty involved in representing for production (rather than for com- 
prehension): equivalents are not enough. To use them properly, extralinguistic data 
is necessary to make informed choices — among other things, regarding institution- 
alization matters such as style and degree of stability. 
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5.2 Monolingual dictionaries: Antenario and DLE 


In this section we will present our analysis of two monolingual dictionaries, the An- 
tenario, a restricted language dictionary, and the DLE, a general language dictio- 
nary. First, we will describe the main characteristics of each dictionary and the 
criteria used to select headwords. Then, we will compare how neologisms are repre- 
sented in the microstructure. 


5.2.1 Antenario 


The Antenario is an online lexicographic dictionary of neologisms from six national 
varieties of Spanish; it was launched in September 2018 and has published 20 new 
entries every month ever since, allowing for a highly isomorphic representation vis-á- 
vis the dynamicity of language. By July 2021, 753 entries had been published. Both 
the headword list and the content of the entries are based on data about neologisms 
used in news media, detected and collected from 2003 by the Antenas Neológicas 
Network in Argentina, Chile, Colombia, Spain, Mexico, and Peru. The criterion for de- 
tection of new items is lexicographic. For a detailed account of methodology and a 
description of the microstructure, see Adelstein/Boschiroli (2020, 2021). 

The possibility of monthly updates is a highly relevant feature for a dictionary 
of neologisms: whatever is published is not final and can be easily changed, which 
reflects the neological nature of the words. Finality is in fact often mentioned as 
one of the defining characteristics of a general dictionary — and also one of its main 
shortcomings. The online format also allows for the compilation of special issues, 
such as the one published at the end of 2020. 

Due to the pandemic, during 2020 the Antenas Neológicas Network undertook 
oriented searches aimed at recording neology about COVID-19 in the member coun- 
tries. In December 2020 the Antenario published a special issue of 49 entries of neo- 
logisms linked to the pandemic, reflecting the exceptional nature of the situation 
we lived throughout the year. Before the end of 2021, 18 new neologisms will be 
published and more COVID-19 entries are expected to be included in 2022. 


Neologisms in the macrostructure of Antenario 

There are two main conditions for choosing headwords for the Antenario: (i) as 
mentioned above, they all come from data collected by the Antenas Neológicas Net- 
work and (ii) as the DLE is updated every year (see section 5.2.2.), each candidate 
headword is checked again against the DLE to make sure they have not been in- 
cluded after they were documented in the Antenas bank. Criteria for selecting the 
headwords are total frequency of use (number of occurrences in the member nodes 
in the Antenas Neólogicas as recorded in BOBNEO), the witness character of the 
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words (mot témoin Matore 1953, prominent word Metcalf 2002) and the year they 
were first recorded. Thanks to the adoption of these criteria candidates are guaran- 
teed to have a certain degree of institutionalization, which is often quite high (see 
in Adelstein/Boschiroli 2021 a description of how these variables have been adapted 
since the Antenario was first published). 

The fact that it is updated monthly, making it possible to reconsider the criteria 
for the compilation of the headword lists (as well as how the items are treated in 
the entries), has helped, in the case of the neologisms of the pandemic, to represent 
in a more realistic way the gradually changing nature of the productivity of resour- 
ces for lexical creation. 

The data bank of Antenas Neológicas Network has recorded a large amount of 
what so far are considered occasionalisms (e.g. coronaburguer ‘cororna burger’, co- 
ronapizza, cuerentenauta “lockdown netsurfer”) and ephemeral neologisms — most 
of them still hapax - (cuarentenable ‘able to be locked down’, coronabulling ‘corona 
bullying’, coronabus ‘bus for COVID-19 infected suspects’, covidiota ‘covidiot’, pos- 
coronial ‘post coronavirus’, adj), which were deemed as unsuitable for publication 
in the dictionary. However, given the dynamic nature of how entries are published, 
if more cases were detected, they could be included in the future. For example, 
there have been new records of covidiota or poscoronial used in a variety of texts, 
which shows their distribution has changed (7 cases were documented: none of 
those from 2021 are mere records of the word in inventories). 

From a temporal point of view, the list of COVID-19 headwords has been growing 
since the first one was drawn. In December 2020 a special edition of 49 pandemic 
lexical items was published, based on a headword list extracted from neologisms de- 
tected between March and July 2020." Then, in April 2021, a short list of candidates 
was selected to be included in the updates of the last months of 2021, which includes 
neologisms documented between August 2020 and April 2021. For 2022, the new 
headword list will include pandemic words recorded in 2021. 

The headwords were chosen according to the following criteria. First, raw fre- 
quency: neologisms which were highly occurring neologisms were privileged (anti- 
cuarentena “anti lockdown’, infectadura ‘dictatorship of the infectologists’) and/or 
documented in most of the network's member countries (aislamiento social “shield- 
ing’, home office, inmunidad de rebaño ‘herd immunity’). Second, geolectal or 
graphic variants of the first choices were added, even if the number of occurrences 
was low (like aplauso sanitario and aplausazo ‘communal clapping’, nueva conviv- 
encia and nueva normalidad “new normal” or as post coronavirus and poscoronavi- 
rus). Third, although probably ephemeral, some frequent colloquial neologisms 


11 These can be accessed here: https://antenario.wordpress.com/tag/pandemia-COVID-19/. Many 
neologisms that had already been compiled were deleted from the original list because they were 
included in the DLE's 23.4 version in November 2020. 


Spanish neologisms during the COVID-19 pandemic — 109 


were included because they were considered to be witness words and are not 
hapax: corona, covidivorcio ‘covidivorce’, zoompleaños ‘Zoom birthday party”. 

Neologisms that will have been published by December 2021 were chosen with 
different criteria. One concern was to complete either the derivational series of the 
headwords published in December 2020 (e.g. prepandemia “pre pandemic’) or se- 
mantic series (autoaislamiento ‘self-isolation’, autocuidado ‘self-care’, autoexamen 
‘self-test’). Secondly, to include synonyms or regional variants that had not been 
documented in other lexicographic tools (e.g. cubrebocas “face mask”). Finally, to 
offer some of the most frequent items detected after the special edition was written 
and published (coinfección ‘coinfection’, semipresencial ‘partly face-to-face’, oxime- 
tro “pulse oximeter”). 


5.2.2 Diccionario de la lengua española [DLE] 


The Diccionario de la lengua española [DLE], published by the Real Academia Espa- 
ñola (RAE), is the monolingual general language dictionary of Spanish most widely 
searched by both native and non-native Spanish speakers. Its current 23rd edition 
(first published in 2014) is updated online once a year, around November. The main 
changes are the inclusion of new entries and the addition of new meanings or new 
information to published entries. The November 2020 update (see sample in https:// 
dle.rae.es/docs/Novedades_DLE_23.4-Seleccion.pdf) included at least 15 changes re- 
lated to the pandemic. 


COVID-19 pandemic neologisms in the macrostructure of the DLE 

The number of changes related to the pandemic in the DLE may look scarce when com- 

pared to TREMEDICA, Antenas, or even the words the RAE itself recorded in April 2020 

as the most searched in the early months of the pandemic.” The changes were: 

— inclusion of entries: coronavirus, coronavirico (‘coronavirus’ adj), COVID-19, cuar- 
entenar / cuarentenear / encuarentenar (‘to quarantine’), desconfinamiento (‘lifting 
of lockdown’), desconfinar (‘to lift a lockdown’), desescalada (‘de-escalation’), vid- 
eochat (n ‘video chat’), videollamada (n ‘video call’), telemedicina (‘telemedicine’) 

- changes in existing entries: barbijo (‘face mask’), confinado -da (adj ‘locked 
down’), confinamiento (‘lockdown’, ‘confinement’), confinar (‘to lock down’), 
cuarenteno -na (n “quarantine”), mascarilla (‘face mask’). 


12 See https://www.rae.es/noticia/las-palabras-mas-buscadas-en-el-diccionario-durante-la- 
cuarentena. 

13 See Zoholobova (2021) for a detailed description of 2020 amends and inclusions in contrast with 
previous editions of the dictionary. 
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If we look at the new inclusions, we can identify different situations regarding their 

degree of neologicity: 

- those that were coined and/or first documented during the pandemic: COVID-19 

— those that existed before the pandemic but did not comply with conditions for 
their inclusion, such as frequency or extension (see section 3.2.): the ones de- 
rived from cuarentena, confinar and coronavirus)“ 

— those that had been long in use but had not been included yet: desescalada 
(‘de-escalation’), videochat (n ‘video chat”), videollamada (n ‘video call’), tele- 
medicina (“face mask”) 


Something the vast majority of these items have in common is being formed from 
Spanish bases and morphemes and well-established rules of Spanish word formation. 
Even in the case of calques (such as videollamada and telemedicina), they may be in- 
terpreted as formed from Spanish bases, as the etymological information provided in 
the entries suggests. This is consistent with the RAE’s recommendations about the 
use of neologisms at large, as discussed in section 3.2. Regardless of frequency or ex- 
tension of use, the RAE has adopted a prescriptive stance and systematically discour- 
ages or rejects the use of loanwords or even calques. A good example is sanitizar “to 
sanitize’ (included in the Antenario but not in TREMEDICA). It is a verb which has 
been widely used during the COVID-19 pandemic and found in all sorts of registers 
(including government and other official texts), which was first recorded in CORPES 
in 2012 and is discussed in RAE’s Observatorio de Palabras (‘Observatory of Words’, a 
portal devoted to answering queries about words which cannot be found in the DLE). 
It is, by the RAE's own admission, one of the most frequently searched words during 
the pandemic (see note 10); however, its use is discouraged on puristic grounds (our 
translation): 


The verb sanitizar (from English, “to sanitize”) has diffused lately, especially in the Americas. 
Despite this, it is advised to avoid the use of the word and its derivations (sanitizado, sanitizante, 
sanitización . . . ) and choose instead patrimonial words [i.e. derived from Vulgar Latin] such as 
sanear, higienizar, limpiar or desinfectar. (https: //www.rae.es/observatorio-de-palabras/ 
sanitizar) 


5.3 Neologisms in the microstructure 
of monolingual dictionaries 


The fact that the pandemic was an ongoing, unstable phenomenon when the dic- 
tionaries did their COVID-19 updates also impacts features of representation at 


14 All of these are documented as in use before 2019 in the press in NOW. CORPES documents pre- 
pandemic cases of coronavirus, cuarentenar, desescalada, videochat, videollamada and telemedicina. 
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microstructural level. In the following sections we will focus on two of them: defi- 
nitions and treatment of geolectal variation. 


5.3.1 Definitions and extension of reference 


One of the most interesting aspects regarding microstructural representation of 
pandemic neologisms is how the relationship between the novelty of the headword 
and the extension of meaning has been reflected. The fact that the items being rep- 
resented lexicographically are very recently created neologisms — even if some of 
them may have become highly frequent — requires defining words whose referential 
extension cannot be totally verified yet. The fact that some of these words are revi- 
talizations (barbijo ‘face mask’) or banalizations of non-neological technical terms 

(aislamiento social ‘shielding’) contributes to this discrepancy between meaning in- 

tension and extension. 

Although some of the items were coined out of an apparent need to name some- 
thing specific in relation to the COVID-19 pandemic, the meaning can or could have a 
different extension. For example, although postcuarentena ‘post lockdown’ (included 
in Antenario) refers to a period after any of the COVID-19 lockdowns in 2020, this 
meaning may have a more general extension. Bearing in mind the componential na- 
ture of meaning, in abstract this word does not refer exclusively to a 2020 lockdown, 
since it could be used in the future, or even for similar lockdowns in the past. In 
other words, the neologism has been coined to name a particular situation but may 
later be used for other referents. 

Notwithstanding the obvious extensions of meaning every word can have in 
natural languages, we have observed the following strategies to overcome this diffi- 
culty in the monolingual dictionaries analyzed here, DLE and Antenario: 

a) Some definitions make no reference at all to the pandemic. In general, they 
seem to refer to words which are not neological from a chronological point of 
view, as queries in corpora such as NOW or CORPES attest, or are banalized 
technical terms. Examples of this can be found, among others, in alcohol en gel 
“alcohol-based gel” (and its synonyms) or supercontagiador supercontagiadora 
‘super spreader’ in Antenario (see Figure 4). 


15 See supercontagiador supercontagiadora “super-spreader”, sense 2: “2. Adj Aplicado a una per- 
sona infectada, que tiene la capacidad de contagiar el virus a un gran número de personas” (‘Of an 
infected person, being able to transmit the virus to a large number of people”) and example 2 “El 
hospital y la iglesia suponen por sí solos el 75 por ciento de los contagios del COVI -19 en Corea del 
Sur, que vio multiplicados casi por 30 las infecciones desde el pasado martes, cuando dio positivo 
la llamada “paciente 31”, una seguidora de 61 años de Shincheonji que las autoridades creen que 
pudo actuar como agente supercontagiador y transmitir la enfermedad a decenas de personas. [El 
Tiempo (Colombia), 24/02/2020]” (“Hospitals and churches amount for 75 percent of COVID-19 
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supercontagiador 
supercontagiadora my fy adj 


Año de la primera documentación: 2020 


Definición 1 m y f Persona infectada por un virus que tiene la capacidad de 
contagiar a un gran número de personas. 
2 adj Aplicado a una persona infectada, que tiene la capacidad de 
contagiar el virus a un gran número de personas. 
3 adj Aplicado a una situación o un evento, que tiene condiciones, 
como ausencia de ventilación o elevada concurrencia, para que se 
produzca un gran número de contagios de un virus. 


Contextos «Un supercontagiador es un paciente que infecta a un número 
desproporcionado de contactos», explica. [Las Últimas Noticias 
(Chile), 25/05/2020] 


El hospital y la iglesia suponen por sí solos el 75 por ciento de los 
contagios del COVI -19 en Corea del Sur, que vio multiplicados casi 
por 20 las infecciones desde el pasado martes, cuando dio positivo 
la llamada «paciente 31», una seguidora de 61 años de Shincheonji 
que las autoridades creen que pudo actuar como agente 
supercontagiador y transmitir la enfermedad a decenas de 
personas. [El Tiempo (Colombia). 24/02/2020] 


El 10 de marzo, durante el ensayo de un coro de iglesia en el 
estado de Washington, se contagiaron el 37% de los presentes, 
cuenta Lea Hammer, epidemióloga del departamento de salud 
pública del condado de Skagit y autora principal de un estudio que 
advierte sobre eventos con potencial supercontagiador, donde una 
persona o un pequeño número de personas deja un tendal de 
infectados. [La Nación (Argentina), 25/08/2020] 


Nodos ARG] CHL coc ESE LEI EEN 


Figure 4: Supercontagiador supercontagiadora entry (Antenario). 


infections in South Corea, whose cases have multiplied by 30 since last Tuesday, when “patient 
31”, a 61-year-old Shincheonji follower who is suspected to have been a superspreader agent who 
transmitted the diseases to dozens of people, was tested positive”). 
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The only reference to the pandemic in the entry for supercontagiador superconta- 
giadora (Figure 4) can be found in the examples (“Contextos”). The same happens in 
some new entries or meanings in the DLE, as in the second sense of confinamiento 
‘lockdown’, ‘confinement’: “2. m. Aislamiento temporal y generalmente impuesto de 
una población, una persona o un grupo por razones de salud o de seguridad. El Go- 
bierno decretó un confinamiento de un mes.” (“Temporary isolation of a community, a 
person or a group, often externally imposed, for health or security reasons. The Govern- 
ment has declared a one-month lockdown.”). A more indirect way to refer to the pan- 
demic is including in the definition of a headword a word whose entry has an 
example about the pandemic. For example, the second sense in confinado, da 
locked down’ and the new entries desconfinar “to lift a lockdown’ (see Figure 5), 10 
desconfinamiento “lifting of lockdown’ all include the newly-defined word confina- 
miento .” However, many of the entries, amendments or additions make no refer- 
ence at all to the pandemic, even when they are neologisms that are presumed to 
refer exclusively to the COVID-19 lockdown (encuarentenar “to lock down”, COVID). 


desconfinar 


1. tr. Levantar las medidas de confinamiento impuestas a 
una población, o a parte de ella, en un territorio u otro 
lugar. U. t. c. intr. y c. prnl. 


Figure 5: Desconfinar entry (DLE). 


b) In some definitions the extension to the pandemic or other phenomena linked to it 
appears restricted with formulas such as “en especial... " or “especialmente” 
(‘especially’) or similar structures (e.g. relative clauses), since, although the words 
were created or revitalized during the pandemic, the reference is wider: in the An- 
tenario, aplausazo ‘communal clapping’ is defined as “Acción colectiva de apoyo y 
reconocimiento, especialmente al personal de la salud, o de protesta, que consiste 
en aplaudir simultáneamente durante un período determinado" 'Colective action 
of support and recognition, especially of health workers', or nueva normalidad 


16 desconfinar v “to lift a lockdown' 1. Tr Levantar las medidas de confinamiento impuestas a una 
población, o parte de ella, en un territorio u otro lugar. U.t.c. intr y c. prnl. “To lift lockdown mea- 
sured imposed on a community, or part of it, in a territory or any other place. Also used as intransi- 
tive and pronominal." 

17 An interesting aspect of the process of synthesis used in these definitions (referring to the noun 
confinamiento and not the verb confinar) is that they rely on use, rather than on the base. On the 
other hand, the addition of senses (in confinado, -da and confinamiento) results in a specialization 
of a meaning that is somehow included in the existing first sense, which highlights both the inade- 
quacy of the original definition, and the fact that it is a semantic neologism. 
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‘new normal’ as “Situación posterior a una crisis que implica un cambio de hábitos 
o expectativas en la sociedad, como la adopción permanente de medidas de pre- 
vención e higiene en el marco de la pandemia de COVID-19" 'Situation after a crisis 
that calls for a change in habits or expectations in society, like the permanent 
adoption of preventive and health measures around the COVID-19 pandemic'. 


The DLE resorts to this kind of strategy indirectly only once, in the definition of co- 
ronavirus (“Virus que produce diversas enfermedades respiratorias en los seres hu- 
manos, desde el catarro a la neumonía o la COVID.”, Virus that causes different 
respiratory diseases in human beings, from cough to pneumonia or COVID'. The 
reason for this may be that most of the DLE additions have a higher degree of stabi- 
lization than those in the Antenario, due to, on the one hand, the different nature of 
the dictionaries (general language vs. neologisms), and on the other, the more con- 
servative approach to new additions the RAE favours, as discussed in 5.2.2.1. 

C) Some of the items refer to events that happened during the pandemic and the defi- 
nition reflects this, even when the componential meaning of the word could be 
used in the future for other situations or referents. For example, in anticuarentena 
‘antilockdown’ all three senses refer to a reaction against “las disposiciones guber- 
namentales de aislamiento preventivo implementadas a causa de la pandemia de 
COVID-19" (“government measures of preventive isolation taken as a result of the 
COVID-19 pandemic’) (see Figure 6). This strategy has not been found in DLE so far. 


d) Unsurprisingly, specific words which are unlikely to be used with other refer- 
ents or in future situations, also include references to the pandemic in their def- 
inition, e.g., coronabono ‘coronabond’ (“título de deuda común europea de 
emisión ünica creado para mitigar la crisis económica generada por la pande- 
mia de COVID-19” “Type of European bond . . . created to mitigate the economic 
crisis caused by the COVID-19 pandemic”), poscoronavirus “post coronavirus' 
(*del periodo posterior a la pandemia provocada por el coronavirus causante de 
COVID-19 o relativo a él", *of the period after the pandemic resultingfrom the 
coronavirus that causes COVID-19 or relative to it’). The DLE has not included 
this kind of headword either. 


In connection to this, it is clear that the low degree of stability of the neologisms is a 
problem in terms of lexicographic representation since, on the one hand, they are 
words that can easily change meaning, in which case their definition will become out- 
dated, and on the other, as we have seen before, their reference may change. For ex- 
ample, covidivorcio ‘covidivorce’ is defined in the Antenario as “divorcio matrimonial 
producido en el marco de la situación de aislamiento a causa de la pandemia de 
COVID-19" (‘divorce that took place while in lockdown during the COVID-19 pan- 
demic’). This definition refers indirectly to a 2020 lockdown; however, the pandemic 
has not finished yet and the word covidivorcio may end up being used to any divorce 
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anticuarentena m y f, f y adj 


Año de la primera documentación: 2020 


Definición 4 m y f Persona que se opone a las disposiciones gubernamentales 
de aislamiento preventivo implementadas a causa de la pandemia 
de COVID-19 
2 f Oposición a las disposiciones gubernamentales de aislamiento 
preventivo implementadas a causa de la pandemia de COVID-19 
3 adj Que se opone a las disposiciones gubernamentales de 
aislamiento preventivo implementadas a causa de la pandemia de 
COVID-19. 


Contextos Además. el presidente de Argentina, Alberto Femández, comparó a 
los anticuarentena con los terraplanistas: “Hacen mucho ruido y los 
medios de comunicación presentan como actores de grandes 
jomadas patrióticas”, dijo. [The Clinic (Chile), 15/08/2020] 


Señalan a Patricia Bullrich y Miguel Ángel Pichetto por fogonear la 
anticuarentena y los acusan de estar detrás de las marchas. [Clarín 
(Argentina), 2/06/2020] 


Los primeros apuntan especialmente contra los grupos 
anticuarentena, que incluso protagonizaron una serie de marchas 
—la última y una de las más masivas el pasado 12 de octubre— 
protestando por la falta de libertad y lo que algunos llaman una 
«infectadura». [El Universal (México), 21/10/2020] 


Nodos ARG] 


Figure 6: Anticuarentena entry (Antenario). 


in this period, and not necessarily to the ones during lockdown. This may require ad- 
justing the definition in the future if such change were observed. 

To sum up, although semantic changes are a feature of every natural language 
and dictionaries are regularly updated to account for them, in this case, the timing has 
been radically different, leading to immediacy in representation, added to the fact the 
events referred to in the definitions are unfinished, all of which results in problems for 
the lexicographic representation of neologisms, including the relative accuracy of the 
definitions, in other words, their decreased reliability and shorter-termed validity. 
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5.3.2 Geolectal variation 


Another aspect of the microstructure, in the case of Antenario, that is affected by 
the unfinished nature of the pandemic is geolectal representation. Attempts were 
made to account for geolectal variants of COVID-19 headwords, even when not all 
of them were originally documented when the relevant data were collected. Also, 
the extension of use may have varied in different countries as the pandemic un- 
folded. Although in theory it would be possible to include these variations, this is 
difficult to do in practice given the number of changes it would involve. 

As a matter of fact, the speed at which COVID-19 neologisms have been in- 
cluded in dictionaries affects dictionaries of neologisms — which, because of their 
very specificity, usually deal with phenomena which are not entirely stable — differ- 
ently than other types of dictionaries. Still, the volume of new words recorded in 
such a short time is unprecedented. 

In the case of DLE, except barbijo, the words and senses related to the pan- 
demic are not marked diatopically, suggesting they are commonly used in all varie- 
ties, even if some of them were hardly used and, when they were, they were used to 
refer to the situation in Spain (e.g. in the Latin American nodes of the Antenas neo- 
lógicas network there are no records of desescalada). 

As regards barbijo ‘face mask’ (sense 2), diatopic labels have been updated, for 
example, Uruguay (“Ur”), excluded in DAMER (see Figure 7), is added (see Figure 8). 
As is usually the case with geolectal variants, instead of defining the word there is a 


barbijo. 
I. 1. m. Bo, Py, Ar, Ur. Cinta o correa que pasa por debajo 
de la barbilla y sirve para sujetar el gorro, el sombrero 
o el casco. € barbiquejo. 

2. Bo, Py, Ar. Pieza de tela que cubre boca y nariz, 
utilizada para mantener la asepsia, generalmente por 
médicos y auxiliares. + barbiquejo. 

3. Py, Ar, Ur. Cinta de cuero o cadena pequeña que se 
pasa por debajo de la quijada del caballo y que tiene 
sus extremos unidos a las argollas superiores del freno. 
rur. 4 barbera. 

II. 1. m. Ar, Ur. p.u. Herida en la cara. 
III. 1. m. PR. perdía, ave. + barboquejo; berbequejo; 
berbiquejo. 


Diccionario de americanismos © 2010 
Asociación de Academias de la Lengua Española © Todos los derechos 
reservados 


Figure 7: Barbijo entry (DAMER). 
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barbijo 
De barba. 


1. m. Sal., Arg., Bol., Par. y Ur. barboquejo. 


2. m. Arg., Bol., Par. y Ur. mascarilla (|| máscara que cubre la boca y 
la nariz para proteger de patógenos). 


3. m. Arg. Herida en la cara. 


Real Academia Española O Todos los derechos reservados 


Figure 8: Barbijo entry (DLE). 


mascarilla 


1. f. Máscara que solo cubre el rostro desde la frente hasta el labio superior. 


2. f. Máscara que cubre la boca y la nariz de su portador para protegerlo de la 
inhalación y evitar la exhalación de posibles agentes patógenos, tóxicos o nocivos. 
Mascarilla quirúrgica, sanitaria. 


3. f. Vaciado que se saca sobre el rostro de una persona o escultura, y 
particularmente de un cadáver. 


4. f. Capa de diversos productos cosméticos con que se cubre la cara o el cuello 
durante cierto tiempo, generalmente breve, con fines estéticos. 


5. f. Cosmético comercial o preparado casero para regenerar, suavizar y dar brillo al 
cabello, que se aplica después del lavado y se deja actuar durante unos minutos 
antes del último aclarado. Sería conveniente que además de un acondicionador te 
aplicaras una mascarilla. 


Figure 9: Mascarilla entry (DLE). 


cross-reference to mascarilla “face mask' (see Figure 9), but also a specification that 
adds information “para protegerlo de la inhalación y evitar la exhalación de posibles 
agentes patógenos, tóxicos o nocivos” (“to protect from the inhalation and avoid the 
exhalation of possible pathogenic, toxic or harmful agents”). However, whereas in 
mascarilla multiword units headed by the noun which were frequent in everyday dis- 
course during the pandemic are included as examples (mascarilla quirúrgica, sanitaria 
‘medical mask’), no multiword units (e.g. barbijo quirúrgico, barbijo social ‘non- 
medical mask”) are included in barbijo although, as the Antenas Neológicas data 
show, they have been very frequent throughout the pandemic. 

To sum up, in each of the lexicographic tools studied, much of the microstruc- 
tural information is, to a certain extent, provisional. 
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6 Conclusions 


In this final section we discuss the results of our analysis of how the characteristics 
of Spanish neology during the COVID-19 pandemic (extremely recent neologisms re- 
ferring to a phenomenon still in process, which provides little time to evaluate fre- 
quency of use and degree of stabilization of the items) have impacted the criteria 
applied for the inclusion and treatment of neologisms in different types of lexico- 
graphic tools and, as a result, on dictionary typology and their social role. 

As regards criteria for inclusion of neologisms, in the bilingual dictionary 
TREMEDICA, many of the items suggested as Spanish equivalents are proposals 
coined by the authors or are ephemeral, as documented by Antenas Neológicas. Al- 
though their inclusion may be driven by the aim to anticipate users” needs, espe- 
cially translators”, they are often forms which have hardly been verified in use. This 
can become a problem in the field of lexicography: these items are thus docu- 
mented, and their documentation can be retrieved later by other lexicographic tools 
or the press as evidence of actual use. Furthermore, the need for urgent compilation 
has also impacted the lack of systematicity in the microstructure: not all entries 
have the same type of information in the same fields (the fields “concepto” and 
“nota” often seem to be used indistinctly) and the synonymous status of variants 
is not clear. 

As regards monolingual dictionaries of Spanish, when it comes to criteria for 
inclusion it is apparent that relevance, dispersion of occurrence (vis-a-vis the high 
frequency of a narrow range of textual types), the witness nature of the items and 
naming needs have all been considered. However, both the chronological criterion 
and, more broadly, the criterion of stabilization (as opposed to the ephemeral na- 
ture of some new coinages), have not always been applied rigorously. 

In the case of the DLE, questions arise about whether users’ searches of what they 
may perceive as neologisms is a working criterion for dictionarization of a functional 
type. In other words, if there is interest for a certain item which is shown to be in cur- 
rent use, it should be included in the dictionary whereas if it is not searched, its inclu- 
sion is not justified. For example, since August 2020 no searches have been made of 
acuarentenamiento ‘lockdown’, anticuarentena ‘anti lockdwon’ or antipandemia “anti 
pandemic”, while there have been 4763 searches of cuarentenar “to lock down”. 

As for treatment in the microstructure, in TREMEDICA, the extremely new na- 
ture of the neologisms is evident in the amount of extralinguistic or usage explana- 
tions that are necessary to complete the information conventionally provided as 
equivalents or definitions. In our monolingual dictionaries, this is more clearly 
seen in the definitions. Even if the Antenario, as a dictionary of neologisms, includes 
non-fully stabilized lexical items, the resources deployed to anticipate the extension 
of reference of such recent neologisms are, in our view, more suitable than the ones 
used by the DLE. 
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Clearly, the degree of institutionalization of neologisms is a criterion that has 
been significantly influenced (one may dare say distorted) by the unfinished and 
unstable nature of the phenomenon of the pandemic, affecting both monolingual 
dictionaries analyzed for this study. 

Indeed, stability and/or stabilisation seem to have been an important factor both 
in the selection and the definition of COVID-19 words in the DLE i.e. not just stability 
of form, but also the likelihood of permanence: most of the words included in the 
2020 update are patrimonial words (which may be why a lower frequency word such 
as encuarentenamiento ‘lockdown’ is included but a widely used calque such as sani- 
tizar “to sanitize’ is not) that can be used again in the future, or that could have been 
included in the dictionary, i.e. not restricted or tied to a transitory situation or period. 
The DLE thus honours the RAE tradition. However, this condition seems to be neces- 
sary but not sufficient to include words in the dictionary. DLE users” needs tend to 
take a back seat and prescriptive considerations are privileged. 

This discussion would not be complete without including a few lines about an 
unexpected turn the situation took in April 2021, when the Diccionario Histórico 
de la Lengua Española [DHLE] was first published online, somehow modifying the 
lexicographic landscape in Spanish. In the presentation, the dictionary claims to 
“aim to describe every aspect (i.e. diatopic, diastractic and chronological) of the 
history of the lexis of Spanish” (our translation). Surprisingly, the headword list 
(which has been updated periodically since its first publication) includes a large 
number of recent lexical units, most of which are not included in the DLE and 
were created in 2020-21, derived from corona- (28) and COVID- (27) e.g., coronoico 
‘coronavirus negacionist’, covidilio ‘COVID affair”. Each of these are described in 
detail in an entry of their own, which provides, among other pieces of informa- 
tion, a definition, and real examples of use, as well as the number of documents 
the item has been found in. See, for example, the entry for coronachivato ‘coronas- 
nitch’ (Figure 10): 

Only two documents, identified as “docs. (2020-2021)” are named to support its 
existence and inclusion. The first one (Navarro 2020) is a light-hearted commentary 
about COVID-19 vocabulary by one of the authors of TREMEDICA (“The prefix corona- 
stands out because of its high productivity, used in more or less humorous neolo- 
gisms such as coronacrisis [. . .] coronachivatos |. . .] and coronaburrirse ‘coronabore’ 
(practically any word, as you can see, was coronable in the coronadays of those state- 
of-alarm days”). The second one is another dictionary, TREMEDICA, which, as men- 
tioned above, and as is common practice in bilingual lexicography, justified by user 
needs, often creates the equivalences, without necessarily claiming the word exists 
or circulates. A search on Google shows every example of use refers back to the DHLE 
entry, often mockingly. There is no evidence the word has been used other than in 
COVID-19 vocabulary inventories, not even in social media, which leaves us wonder- 
ing what lexicographic methodology was used to formulate the definition in DHLE, 
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coronachivato, a s (20205 


coronachivato 


Mostrando ordenación cronológica y 
E amilia “e; 
Etim. Compuesto de corona y chivato. 


Se documenta por primera vez, con la acepción 'persona que acusa o delata [a otra] por quebrantar las normas establecidas por las 


autoridades durante la pandemia del coronavirus’, en septiembre de 2020, en un articulo de F. A. Navarro publicado en la Revista 


Española de Cardiología (Madrid) 


l. s.m. yÍ Persona que acusa o delata [a otra] por quebrantar las normas establecidas por las autoridades durante la pandemia del 
coronavirus. 


Sinónimo: covichivato, a 
docs. (2020-2021) 2 zmazos 


2020 Navarro. FA "Covid-19" [23-09-2020] Revista Española de Cardiologia (Madrid) Esp (HD) 

= La mayoría de los neologismos que nos trajo la pandemia covídica, no obstante, fueron de origen popular, humorísticos y pensados como fior de un 
día. Si en inglés los hablantes de a pie dieron en llamar al SARS-CoV-2 the rona o Miss Rona (por abreviación de corona). entre nosotros vimos nacer 
también alias coloquiales como coronabicho, acojonavirus, cabronavirus, carallovirus, cojonavirus, confinavirus, coñazovirus o coronito. Destaca por su 
productividad el prefijo corona-. usado en neologismos más o menos jocosos como coronacrisis, coronabonos, coronacoma (económico). 
coronacompras, coronadivorcios, coronafiestas, coronapijos, coronachivatos, coronabilis, coronaplausos y coronabumirse (prácticamente cualquier 
palabra, como puede verse, fue coronable en los coronadias del estado de alarma). 


2021 LAD 


jamanca) Esp (HD 


jel coronaburro de 


Diccionario histórico de la lengua española 


Real Academia Española & Todos los derechos reservados 


Figure 10: Coronachivato entry (DHLE). 


other than copying from TREMEDICA (which, in fact, offers a humorous definition, 
see Figure 1 and footnote 9) or basing it on formal considerations. 

The hasty inclusion of such neologisms — which one may even doubt to classify 
as ephemeral, in many cases, since they have never been actually used in speech — 
can have the effect, as suggested above, of distorting linguistic reality. The word is 
assumed to exist because it has been included and given full treatment in a RAE 
dictionary (the DHLE) and many users, given media coverage, assume it has been 
included in the DLE. 

This, in turn, and understandably, weakens credibility in the general dictio- 
naty, as was evident in comments in social media, and creates confusion, given the 
RAE's traditionally conservative approach (Bernal/Freixa/Torner 2020) and the fact 
that many other words Spanish speakers use in their everyday life are excluded (or 
banned) from either dictionary. 

This leads us to conclude there has been circularity in Spanish lexicography, 
between author's neologisms and occasionalisms in connection to the COVID-19 
pandemic recorded in different lexicographic tools — often resulting in the non- 
verification of the use of those words — the DHLE, their use in the press and their 
social circulation as mentions. 

This is all the more striking if we consider the role dictionaries play in legitimiz- 
ing language use “even though, in theory, they are only supposed to provide a de- 
scription of the vocabulary used by members of a community” — particularly in the 
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case of historical languages such as Spanish - and as reference works that develop 
“the standard of a language and an identity”, as pointed out in Rodríguez Barcia/ 
Moskowitz (2019: 3). 

As ten Hacken/Koliopoulou (2020: 129) suggest, “dictionaries are used as an au- 
thority and interpreted as gatekeepers”, which is why any word whose use has not 
been verified may still be socially regarded as sanctioned and accepted as a word 
belonging to the language once it is included in the dictionary. 

Our claim about circularity in representation in Spanish lexicography and its 
impact how COVID-19 pandemic words circulated socially leads us to suggest three 
issues that need to be further studied: (i) marketing, (ii) the notion of neologism 
itself, and (iii) typology of dictionaries. 

First, marketing considerations may have played a role in such circularity, 
modifying established criteria for inclusion (or even acknowledgment) of head- 
words in dictionaries such as the DHLE. As ten Hacken/Koliopoulou (2020: 129) 
point out: “As Kilgariff (2013: 81) notes, [these words] might not be very important 
for an objective description of the language but they are loved by marketing teams 
and reviewers”, somehow diverting the objectives of lexicography. 

Second, regarding the concept of neologism itself, in the Spanish tradition the 
lexicographic criterion — especially vis-a-vis the DLE - plays a defining role when 
considering the loss of neologicity of a neological item. Inclusion in the DLE deter- 
mines a word is no longer neological. This is why the Antenario, a tool which only 
deals with neologisms, ended up not publishing in their December 2020 special edi- 
tion lexical items (e.g. coronavírico -ca ‘coronavirus’ adj., COVID-19, desconfina- 
miento “lifting of lockdown”,) which, from a chronological and/or psycholinguist 
perspective, were actually neological. 

Finally, we find our starting hypothesis about the existence of a certain degree of 
overlap of some features which are traditionally thought to be specific to each type of 
dictionary, has been confirmed. Dictionaries which, unlike dictionaries of neologisms 
(which make no claim to finality of stability regarding the place in the language of 
the items collected), are not restricted to these phenomena or not supposed to collect 
them, ended up recording ephemeral or witness items, with a very low or null fre- 
quency of use. Those words are then defined considering an extension of reference 
and use that cannot be verified yet. The properties of being transition and/or reme- 
dial devices do not seem to be exclusive of dictionaries of neologisms when it comes 
to dealing with COVID-19 lexis. 
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Specialized voices in the 23"* edition 

of the Diccionario de la lengua española: 
Analysis of the COVID-19 field 

and its neologisms 


1 Introduction 


The unexpected spread of the Coronavirus has produced, among other things, linguis- 
tic and lexicographic changes with characteristics that are being studied as the pan- 
demic unfolds. Neologisms were quickly coined and picked up around the world, 
while new words were created or new meanings were given to existing words.’ 
Naturally, Spanish language speakers were no strangers to this trend, which was 
soon examined by the media, linguistic observatories, and lexicographic works. 
Thus, in late 2020, when the updated edition of the Diccionario de la lengua Española 
(DLE, 23.4 2020)? came out, the lexicographic changes announced included several 
pandemic-related changes (https://dle.rae.es/contenido/actualizaci%C3%B3n-2020).? 


1 This study was conducted in the framework of the Research Program on Terminology, Special- 
ized Lexicography, and Organization of Knowledge proposal, financed under the call for “Research 
and Development Groups” of the Sectorial Commission for Scientific Research (CSIC), Universidad 
de la República, Uruguay (2018-2022). The researchers jointly responsible for the program are 
Mario Barité and Magdalena Coll. 

2 The dimension of the trend has been such that, shortly after the start of the pandemic, various 
“Coronadictionaries” drawn up by readers, journalists, and others began circulating on social 
media and traditional media (see, for example, https://www.lanacion.com.ar/sociedad/coronavi 
rus-zoompleanos-tapabocas-palabras-nacieron-o-se-nid2370269/). 

3 This is an academic lexicographic work that is an essential authoritative reference. It originated 
as one of the leading objectives of the Spanish Royal Academy (Real Academia Española), in the 
framework of its foundation, in Madrid in 1713. Twenty-three editions had been published as of 
the year 2014. Starting with the 21* edition (1992), there was an increase in the number of meanings 
specific to individual Spanish-speaking countries, whose language academies are part of the Asso- 
ciation of Academies of the Spanish Language (Asociación de Academias de la Lengua Española, 
ASALE), formed in 1951. The DLE's purpose is to compile the general lexicon used in Spain and in 
Hispanic countries throughout the world and it is aimed primarily at speakers whose mother 
tongue is Spanish. It is a normative work and receives more than 90 million queries each month on 
its online version (dle.rae.es). 
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Not only were new words connected with the coronavirus included, some old defini- 
tions were also revised to adapt them to the new global situation. 

This is, without a doubt, an unprecedented scenario in terms of updating prac- 
tices of the academic lexicography; the new technologies make it possible for the 
dictionary to be updated at a pace never before seen in the history of the DLE. With 
the pandemic, exceptional decisions have been made, considering the short time 
that elapsed, in many cases, between the emergence of a COVID-related word and 
its publication in the dictionary (cf. Battaner 2021). 

There is also another factor that has changed the academic lexicographic land- 
scape: the boost that the historical dictionary Diccionario histórico de la lengua es- 
pañola (DHLE, 2013-)* has received in recent years, as will be discussed below. 

Academic lexicography is therefore undergoing a unique moment, both be- 
cause of the pace at which the DLE is updated and because of the coexistence of 
that process with a renewed and dynamic DHLE. This is the situation in which lexi- 
cography has addressed and is addressing, at the academic level, the vocabulary of 
a pandemic of extraordinary dimensions. 

Along these lines, this article has a double objective. First, it seeks to offer an ini- 
tial approach, with critical notes, to the group of pandemic-related neologisms incor- 
porated into the DLE in the year 2020. To that end, the trends in the academic 
dictionary's incorporation of neologisms will be reviewed, focusing in particular on 
specialized language neologisms. Second, the article presents the design of a research 
study that allows for the examination of any new words beginning with CORONA- 
added to the DLE and the DHLE. An assessment will be made of the particularities of 
the DLE and the DHLE regarding the incorporation of the new words, as well as the 
degree of correspondence or complementarity between the two works in this sense. 
This will show the complementary roles that the DLE and the DHLE are currently ac- 
quiring. In this sense, the new additions open up a debate on the treatment of neolo- 
gisms in academic lexicography, in a particularly unique scenario. 

This paper will thus give a brief overview of the policy for incorporating neolo- 
gisms into the academic dictionary (section 2), with special attention to technical neo- 
logisms. The general characteristics of the updating practices of academic dictionaries 


4 The purpose of the DHLE, formerly known as the New Historical Dictionary of Spanish (Nuevo dic- 
cionario histórico del español, NDHE), is to present in an organized manner the evolution of the Span- 
ish lexicon over time and up to the present. It is a “complete dictionary” accessed for free on the 
Internet that seeks to compile the entire lexicon, covering every period and every region where Span- 
ish is and has been spoken. In doing so, it shows the changes that words have experienced in mean- 
ing over time and even the accidental linguistic uses of a given period. It is a long-standing project 
that had frustrated attempts throughout the years, but whose development has received a decisive 
boost with the creation of the Pan-Hispanic Network of Academies, Universities, and Research Cen- 
ters for the Production of the Historical Dictionary of the Spanish Language (Red Panhispánica de 
Academias, Universidades y Centros de Investigación para la Elaboración del Diccionario histórico de 
la lengua española) in 2021 (https: //www.rae.es/dhle/). 
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will be addressed, with respect, in particular, to terms that emerged in the pandemic 
(section 3), with subsection 3.1 dealing with the general aspects of the updating process 
for the 23" edition of the DLE, and subsection 3.2 with those of the DHLE. The specific 
research on words beginning with CORONA- is discussed in section 4, which is divided 
into a description of the design of the research study (subsection 4.1) and its findings 
(subsection 4.2). The paper concludes with some final considerations in section 5. 


2 Neologisms and technical neologisms 
in academic dictionaries 


As is well known, the definition of neologisms is somewhat controversial and its 
lexicographic treatment even more so. Lexicographers have, in fact, been discus- 
sing the criteria for the inclusion of neologisms in dictionaries since the field was 
first developed, but in recent years there has also been a specific line of theoretical 
research on the subject (e.g. Bernal et al. 2020: 593). 

As early as 1992, Alvar provided an overview of the treatment given to neolo- 
gisms in the Diccionario de la Real Academia Española (DRAE).? He argued that the 
general dictionary cannot incorporate all the words that emerge, “as [to be incorpo- 
rated, words] require widespread use among speakers, authorities that use them, 
and a stability to ensure they are not birds of passage. The process may seem slow, 
but it is the only way.” (Alvar 1992). This classic conception has clearly changed, 
primarily with respect to a words stability, which is something that cannot be mea- 
sured given the very short time spans separating the start of the pandemic and the 
DLE's incorporation of pandemic-related words.“ 


5 Historically, the main academic dictionary was known as DRAE (Diccionario de la Real Academia 
Española) but “since its last edition in 2014, the acronym DLE (Diccionario de la lengua española) is 
being furthered because of its identification and recognition in the lexicographic landscape, as this 
acronym is optimal and corresponds to the official name the dictionary has always had” (Moreno 
Moreno 2019: 86). For this reason, we use the acronyms DRAE or DLE, as appropriate. 

6 Usage, a criterion considered valid for incorporating a word into the dictionary, has also undergone 
several re-conceptualizations since the first dictionary of the Real Academia Española (1726-1739), 
when it was based on the use by authors who “have treated the Spanish language with the greatest 
accuracy and elegance” (https://www.rae.es/obras-academicas/diccionarios/diccionario-de-autori 
dades-0). Modern-day usage is documented with data from a corpus that covers different registers, 
styles, and geographical areas. But in recent years another tool has emerged that is strongly linked to 
usage: the possibility of retrieving the queries made by users in the DLE regardless of whether the 
words searched for are in the dictionary or not. Records are now available of word searches made by 
users in the DLE for words not yet included in the dictionary, but for which search frequency is very 
high. These can be analyzed and systematized by lexicographers, and that, in turn, can influence deci- 
sion-making when updating the dictionary. 
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As observed by Adelstein and Freixa (2013), the incorporation of a neological 
form into a dictionary usually responds to a combination of different criteria, in- 
cluding frequency, formal, semantic, and documentation criteria. Thus, the words 
most likely to be included in a dictionary are those that are used very frequently, 
highly necessary for naming purposes, internationalized, easily adaptable, with a 
derivative family, etc. (Bernal et al. 2020: 606). These criteria must also be pondered 
according to the type of dictionary in question (Bernal et al. 2020: 594). 

Bernal et al. analyze the words added to the 23" edition of the DLE, updated in 
2019, to “infer the non-explicit criteria used to perform the selection and contrast 
them with the criteria proposed by specialized literature” (2020: 608). This paper 
will only consider the pandemic-related words that were added in 2020, that is, in 
the 23.4 edition of the DLE. 

In their analysis, Bernal et al. (2020: 608) suggest that the above criteria have 
been taken into account unevenly in the academic dictionary. The frequency crite- 
rion does not appear to have been used as an exclusion filter, while the formal crite- 
rion has, given that all “the derived and compound words [incorporated into the 
academic dictionary] are correctly formed words (in the sense that they follow the 
rules for correctly forming words)” (Bernal et al. 2020: 609).Thus, the inclusion of 
word families “lends consistency to the dictionary in the sense that complete deriv- 
ative series are provided or else derivative series that already had a representative 
in the dictionary are completed” (Bernal et al. 2020: 609). As for semantic criteria, 
the selection of entries added to the 2019 edition of the DLE would appear to under- 
score the denominative need. Bernal et al. observe that a significant number of neo- 
logisms that have an entry in the dictionary are words that belong to scientific 
subject areas, such as medicine, biochemistry, or architecture, which lost their spe- 
cialized terminological value once they were incorporated into the general language 
(2020: 610). Moreover, documentary criteria do not appear to have been decisive in 
the inclusion of certain words in the DLE edition studied by Bernal et al. (2020). As 
will be seen in the next section, the criteria found by Bernal et al. for the 2019 incor- 
porations are the same that were applied for the 2020 incorporations. 

The authors conclude that the criteria that appear to have had more weight in 
the decisions to incorporate words are the criteria connected with the work's inter- 
nal logic, an aspect that is maintained in the updated version considered here: 


[O]n the one hand, it is observed that many of the new words added complete derivative series; 
on the other, the lexicon of subject areas present in the dictionary is enhanced, thus completing 
the coverage thereof, despite the fact that in some cases they are highly specialized words, a char- 
acteristic that apparently should be pondered as a negative aspect. (Bernal et al. 2020: 613) 


The treatment of specialized words in the academic dictionary merits some reflec- 
tions. A brief examination of that academic work reveals that many specialized 
words are not labeled as such. For example, the DLE (2014) assigns the diatechnical 
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label ‘Zool.’ to families, phyla, or groups, as in the case of echinoderms, but not to 
“starfish,” one of its species.” 

In this context, it is important to bear in mind the information that is provided 
in the forewords and introductions of the last editions of the DLE. Both the 2001 
and the 2014 editions omit any reference to the nature of the diatechnical labels 
and their treatment. Only one (isolated and indirect) consideration is made, and 
that is when establishing the precedence of the labeled meanings. Neither edition 
provides an explicit explanation of what a diatechnical label is, and no definition of 
it is given in the body of the dictionary either (Barité/Blanco 2014). 

With respect to the application of diatechnical labels, in the 2001 edition there 
is an interesting explanation of the so-called “technical words”: “The dictionary in- 
cludes any words and meanings from different fields of knowledge and professional 
activities whose current use — technical archaisms are also excluded - has gone be- 
yond their original scope and spread to frequent or occasional use in everyday or 
cultivated language” (DRAE 2001: xlviii). In its 2014 edition, there is no mention of 
how diatechnical labels are applied. There is only an example in the sample entries 
in which the diatechnical label is explained as the specialized field to which the 
word or sense corresponds. 

According to Barité/Blanco (2014), three different aspects of the treatment of 
specialized words in the academic dictionary can be inferred from the information 
in the 2001 edition of DRAE: (a) technical archaisms are excluded from the dictio- 
nary; (b) also excluded are specialized words or meanings that have not gone be- 
yond their field and are therefore only known and used in internal communications 
in that field; and (c) specialized words and meanings whose use, whether frequent 
or occasional, has spread beyond its field are included. However, it is not always 
clear which words or meanings included merit a diatechnical label. 

With the information available, it would appear that no other criteria were ap- 
plied for the updates to the 23" edition of the DLE, so that it can be concluded that 
the above criterion still applies. It is actually a very logical criterion, which ac- 
counts for the de-terminologization of certain expressions or an unusual socializa- 
tion of certain words, without losing their condition of specialized word. It is 
reasonable to assume that in the innovative 24" edition, on which the Spanish lan- 
guage academies have been working for some years now, this type of information 
will be made explicit. 

The inclusion of technical neologisms in dictionaries has a tradition of its own, 
which is strongly linked to the diversification of professional activities and fields of 
knowledge that give rise to specialized languages. There are also different real-life 


7 A diatechnical label, also known as a “subject label”, “specialization label”, or “thematic label”, 
qualifies the definition of a lexical unit by indicating the scientific discipline, technical field, pro- 
fession, or specialized area it belongs to (Martínez de Sousa 2009: 147). 
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events, such as the pandemic that began in 2019, that force established disciplines to 
reallocate resources and quickly direct research toward problem-solving, in this case 
problems concerning the classification of a virus and the diagnosis, prevention, and 
treatment of the disease it causes. This type of situation is fertile ground for the rapid 
emergence of new terms that are quickly picked up and become widespread. 

All these circumstances naturally occur in the oral and written language used 
by specialists among themselves. These terms, as such, may be included in the dic- 
tionary, and, if they are, they may be accompanied by the corresponding technical 
label and an appropriate definition. As the pandemic progresses, these terms will 
be used by specialists in their communications with lay persons, or between health 
authorities or specialized reporters and ordinary citizens. These terms will also 
begin to be used in everyday language, making their lexicographic treatment differ- 
ent: they will not necessarily carry a label indicating a specialized field and their 
definition will tend to be less technical. 

Things are not, however, that simple or linear. Strictly speaking, specialized lan- 
guages would only include scientific terms used by specialists in their specialized 
communications or in dissemination activities, and which, for example, are part of 
the formal (scientific or technical) classifications of their specialized fields. Within 
this framework, and from a lexicographic perspective, in the general dictionaries of a 
language, only those words or meanings that contain a diatechnical label could be 
considered specialized. Applying instead a more open criterion, the set of specialized 
words or meanings could be expanded to include those that directly or indirectly 
refer to a specialization, and that are usually jargon or technical slang words, or just 
constructions coined for communication among non-specialists, even when they are 
adopted because of their picturesque quality. 


3 Academic dictionary updates in the context 
of the pandemic 


3.1 General characteristics of the updating process 
of the 23" edition of the DLE 


The most recent edition of the DLE is the 23°, and while it came out in 2014, it has 
been regularly updated in its online searchable version, which became available in 
2015. Successive batches of modifications approved by the language academies, 
which will all be ultimately included in the 24' edition, have been periodically 
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incorporated.* The last such update, which is what interests us here, was an- 
nounced in December 2020 and it is known as the 23.4 edition of the DLE.? 

Like all dictionaries of such magnitude, DLE has been updated periodically since 
its first edition in 1780. In the eighteenth century, there were two updates; in the 
nineteenth, ten; and in the twentieth, eight. In the twenty-first century, the 22 edition 
was released in 2001 and the 23'* edition came out in 2014, as noted above. In this 
sense, as of the twenty-first century, technological developments enabled a change in 
the very concept of updating, and, following the 23" edition, updates are performed 
continuously. This has generated different and renewed versions of the edition pub- 
lished originally on paper in 2014. The dictionary is no longer something static; rather, 
changes can be made regularly to it and released as they are made. This is an unprec- 
edented situation for Spanish academic lexicography. 

The words connected with the pandemic that were added to the DLE as of Decem- 
ber 2020 can be classified according to different aspects (cf. Battaner 2021): they can be 
considered hereditary words already present in the DLE (such as distanciamiento ‘dis- 
tancing”, normalidad ‘normality’, barbijo ‘mask’, aislamiento ‘isolation’); new combina- 
tions (distanciamiento social “social distancing” or distancia de seguridad “safety distance”, 
nueva normalidad ‘new normal”, barbijo quirúrgico ‘surgical mask’, aislamiento so- 
cial “social isolation”, contacto estrecho ‘close contact’); or neologisms formed by 
derivation or composition (anticuarentena “anti-quarantine”, desconfinamiento “un- 
lockdown’, intrafamiliar “intra-family”, pospandemia “post-pandemic”) (cf. Battaner 
2021). There are also crude anglicisms (such as home office), often linked to new 
practices popularized in 2020. 

In its 23 edition, updated in December 2020,?? the DLE features changes in the 
meanings of confinado ‘confined’ and confinar ‘confine’, while the entry for confi- 
nado, da ‘confined’ (masculine and feminine) was also amended. One new meaning 
of confinamiento ‘confinement’, related to ‘lockdown’, was added: 


8 The 24" edition of the DLE will differ from all previous editions and will involve a thorough over- 
haul of the work in a wide range of its structural elements. The new DLE “will be digital from its 
very conception. It is no longer a matter of converting into an electronic resource something that 
was conceived, and in part developed, as a work intended to be published on paper, but of creating 
a genuinely electronic dictionary, with everything such a fundamental fact implies” (https://www. 
asale.org/noticias/la-rae-presenta-la-primera-actualizacion-de-la-23a-edicion-de-su-dle). 

9 The first update of the 23'* edition was done in December 2017; it was then updated again 
in December 2018 and in December 2019. This article focuses on the latest update, in December 2020. 
The academies are currently working on the 23rd edition and on the 24th edition simultaneously. 

10 This update includes more than 2,000 changes. 

11 The Fundación del Español Urgente (Foundation for Urgent Spanish), promoted by the EFE 
Agency and the Real Academia Española, even chose “confinamiento” as the word of the year 
2020. 
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confinado, da. [Amended entry]. [. . .] adj. 1. Dicho de una persona: Obligada a vivir en un de- 
terminado lugar. U. t. c. s. | m. y f. 2. Persona sometida a un confinamiento (| aislamiento im- 
puesto a una población). || 3. Der.En algunos países, persona que sufre la pena de confinamiento. 


[(adjective) 1. Said of a person: Forced to live in a given place. Also used as a noun. 2. (Mascu- 
line and feminine) Person subjected to a confinement. (|| isolation imposed on a population) | 
3. (Law) In some countries, a person who suffers a penalty of confinement.] 


confinamiento. [Amended meaning] m. 1. Acción y efecto de confinar o confinarse. [(Mascu- 
line) 1. Action and effect of confining or confining oneself.] 


confinamiento. [Meaning added].| m. 1 bis. Aislamiento temporal y generalmente impuesto 
de una población, una persona o un grupo por razones de salud o de seguridad. El Gobierno 
decretó un confinamiento de un mes. 


[(Masculine) 1. bis Temporary and generally imposed isolation of a population, a person, or a 
group due to health or safety reasons. The Government decreed a month-long confinement.] 


confinar. |. . .] | 2. [Amended meaning]. tr. Encerrar o recluir algo o a alguien en un lugar 
determinado o dentro de unos límites. U. t. c. prnl. Se confinó EN su casa.” [(Transitive verb) 2. 
Lock up something or someone in, or commit them to, a given place or within certain limits. 
Also used pronominally. They confined themselves IN their home.]'* 


Cuarentenar ‘quarantine’ and its variants cuarentenear and encuarentenar were 
added as verbs in the DLE and one of the meanings of cuarenteno “quarantined per- 
son’ was amended: 


cuarentenar. [Entry added]. tr. 1. Poner algo o a alguien en cuarentena ( || aislamiento preven- 
tivo por razones sanitarias). Cuarentenaron un hospital. U. t. c. prnl. Se cuarentenó durante la 
epidemia. || intr. 2. p. us. Pasar un período de cuarentena (|| aislamiento preventivo por razones 
sanitarias). Se permite el regreso a la ciudad de origen para cuarentenar. 


[1. (Transitive verb) To place something or someone in quarantine (|| preventive isolation for 
health reasons). They quarantined a hospital. Also used pronominally. They quarantined them- 
selves during the epidemic. || 2. (Intransitive verb, scarcely used) To go through a period in 
quarantine (|| preventive isolation for health reasons). They are allowed to return to their home 
town to quarantine.] 


cuarentenear. [Entry added]. intr. 1. Pasar un período de cuarentena (|| aislamiento preventivo 
por razones sanitarias). Es más llevadero cuarentenear con alguien.|| tr. 2. p. us. Poner algoo a 
alguien en cuarentena (|| aislamiento preventivo por razones sanitarias). Tendremos que cua- 
rentenear el ganado. Las autoridades cuarentenearon el crucero. 


[1. (Intransitive verb) To go through a period in quarantine (|| preventive isolation for health 
reasons). Quarantining is more bearable if you do it with someone. || 2. (Transitive verb, scarcely 
used) To place something or someone in quarantine (|| preventive isolation for health reasons). 
We will have to quarantine the livestock. The authorities quarantined the cruise.] 


12 https://www.rae.es/noticia/la-rae-presenta-las-novedades-del-diccionario-de-la-lengua-espa 
nola-dle-en-su-actualizacion. 
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encuarentenar. [Entry added]. tr. Poner algo o a alguien en cuarentena (| aislamiento preven- 
tivo por razones sanitarias). Si alguien se infecta, habrá que encuarentenar a toda la colonia. 
U. t. c. prnl. Me encuarentené por precaución. 


[(Transitive verb) To place something or someone in quarantine (|| preventive isolation for 
health reasons). If anyone is infected, the whole colony will have to be quarantined. (Also used 
pronominally) I quarantined myself just to be safe. 


cuarenteno, na. |... .] | 7. [Amended entry]. f. Aislamiento preventivo a que se somete durante 
un período de tiempo, por razones sanitarias, a personas, animales o cosas”. 


[| 7. (Feminine) Preventive isolation that persons, animals, or things are placed under for 
health reasons.]? 


Also, desconfinamiento ‘unlockdown’ (noun) and desconfinar ‘unlockdown’ (verb) 
were added as antonyms of confinamiento and confinar, respectively: 


desconfinamiento. [Entry added]. m. Levantamiento de las medidas impuestas en un 
confinamiento. 


[(Masculine). Lifting of the measures imposed in a confinement (or lockdown).] 


desconfinar. [Entry added]. tr. Levantar las medidas de confinamiento impuestas a una pobla- 
ción, o a parte de ella, en un territorio u otro lugar. U. t. c. intr. y c. prnl. 


[(Transitive verb) To lift confinement (or lockdown) measures imposed on a population, or 
part of it, in a territory or other place. (Also used as intransitive and pronominally).]'^ 


New definitions were given to the entry mascarilla to adapt it to the meaning of 
‘mask’: 
mascarilla. [. . .] 2. [Amended meaning]. f. Mascara que cubre la boca y la nariz de su portador 


para protegerlo de la inhalación y evitar la exhalación de posibles agentes patógenos, tóxicos o 
nocivos. Mascarilla quirúrgica, sanitaria. 


[(Feminine) 2. A mask that covers the mouth and nose of the wearer to protect them from in- 
haling and preventing them from exhaling possible pathogenic, toxic, or noxious agents. Sur- 
gical, sanitary mask.]** 


Coronavirico ‘coronaviral’ and coronavirus were added as terms from the field of 
medicine. It is interesting to note that an etymology for coronavirus was also added, 
indicating that it is derived from the English word coronavirus, but also recognizing 
Latin as the language from which it was derived originally. 


coronavírico, ca. [Entry added]. adj. Med. Perteneciente o relativo al coronavirus. 


[(Adjective, medicine) Of or relating to coronavirus.] 


13 Ibid. 
14 Ibid. 
15 Ibid. 
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coronavirus. [Entry added]. m. Med. Virus que produce diversas enfermedades respiratorias 
en los seres humanos, desde el catarro a la neumonía o la COVID. [ (Masculine, medicine) A 
virus that causes various respiratory diseases in humans, from the common cold to pneumonia 
or COVID.] 


coronavirus. [Etymology added to the entry]. (Del ingl. coronavirus, de corona ‘corona solar, 
por el aspecto del virus al microscopio, y este del lat. coróna ‘corona’, y virus ‘virus’, y este del 
lat. virus “veneno”, ‘ponzofia’). 


[From the English coronavirus, from corona “solar corona”, because of the appearance of the 
virus under the microscope, and this from the Latin coróna ‘corona’ and virus ‘virus’, derived 
in turn from the Latin virus ‘venom’, *poison'.]'é 


The term coronavirus would appear to be a prototypical case of the model the Acad- 
emy is adopting in an effort to address criticism it has received in the past. A brief 
definition is provided, which is simply phrased and straightforward. It includes the 
core propositions of the concept, so it can serve as a starting point for readers to 
work with texts of greater scope or depth, such as specialized texts. 

This is an accurate definition, as it does not deny the fact that while it has 
gained visibility in the current pandemic, the term coronavirus has existed since 
1968 (Ochoa Montes/Ferrer Marrero 2021), with seven varieties of the virus having 
been discovered thus far, including SARS-CoV-2. It also avoids the need to have an 
entry under the specific name of the virus that caused the pandemic. 

At the same time, the definition of coronavirus mentions COVID, although the 
World Health Organization announced on February 11, 2020, that the official name of 
the disease would be COVID-19, a contraction of the term coronavirus disease 2019. 
However, in the DLE, COVID-19 is recorded but it refers back to COVID, perhaps be- 
cause it is assumed that that is the most frequent form used in the Spanish language. 

A new entry was added for COVID as a medical term and its English etymology 
was also added. The DLE states that this word can be a feminine or masculine 
noun. In the process of its adaptation to different varieties of Spanish, it has devel- 
oped two different accentuations: covid and cóvid, although this is not mentioned 
in the dictionary. 


COVID. [Entry added]. m. o f. Med. Síndrome respiratorio agudo producido por un coronavi- 
rus. COVID-19. m. o f. Med. COVID. 


[(Masculine or feminine, medicine) Acute respiratory syndrome produced by a coronavirus. 
COVID-19. (masculine or feminine, medicine). COVID.] 


COVID.[Etymology added to the entry]. (Del ingl. COVID, y este acrón. de coronavirus disease 
“enfermedad del coronavirus’.] 


16 Ibid. 
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[From the English COVID, the acronym of coronavirus disease)” 


The criteria that Bernal et al. (2020) had already observed in the 2019 update are 
also manifested in this update. Among these criteria, the most prominent in aca- 
demic lexicographic practice is the consistency in updating a word with its deriva- 
tives. That lends rigor and internal consistency to the dictionary. In addition, a 
cautious attitude can be observed in the assessment of the frequency of a word and 
its stability. There is no urgency, at least in specialized language, to incorporate 
new words or meanings. As a result of the pandemic only two words with the label 
‘Med.’ were included: coronavirico, ca and coronavirus. As we will see in section 4, 
the DHLE takes more liberties. 


3.2 General characteristics of the DHLE update 


One of the most noteworthy developments for Spanish language lexicographic re- 
search is the availability, since 2013, of some entries of the Diccionario histórico de 
la lengua española (DHLE, 2013-). This is the result of an academic pan-Hispanic 
project, which received its final ratification with the establishment, in April 2021, of 
the Red Panhispánica de Academias, Universidades y Centros de Investigación para 
la Elaboración del Diccionario Histórico de la Lengua Española (Pan-Hispanic Net- 
work of Academies, Universities, and Research Centers for the Production of the 
Historical Dictionary of the Spanish Language, https://www.rae.es/dhle/). 

According to its introduction, it is a “digital-native dictionary that seeks to fully 
describe (diatopically, diastratically, and chronologically) the history of the Spanish 
language lexicon,” as well as to “analyze the history of the lexicon in a relational 
perspective, addressing the etymological, morphological, and semantic connections 
that are established between words” (DHLE 2013-). As a diachronically oriented data- 
base, the persons behind it aim to organize the entries by semantic fields or lexical 
families. It is important to note that the DHLE shares the same databases with the 
DLE, but it has its own seal and independent funding. 

A detailed explanation of the DHLE's structure, its characteristics, and the type 
of relations between words and meanings (morpho-etymological, change mecha- 
nisms, semantic) can be found in a document that is available on its website. The 
document also clearly defines the terms of reference (lemma, sub-lemma, hyper- 
lemma, meaning, sub-meaning, variant, syntactic scheme, documentation, multi- 
word unit, and lexical family). In addition, four types of labels are identified for the 
meanings included in the DHLE: diatopical, pragmatic, sociolinguistic, and, most 
importantly for this study, diatechnical or specialization labels. 


17 Ibid. 
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Available online with more than six thousand entries, the aim of the DHLE is to 
show, in an organized manner, the evolution of the Spanish lexicon over time and up 
to the present. In March 2021 alone, 715 new monographs were added, which led to 
the production of entries for words in different semantic fields and lexical families. A 
difference observed between the DLE and DHLE is that the former usually incorporates 
new words or meanings in more or less numerous batches, on a regular basis; whereas 
incorporations in the DHLE are done by lexical families, among other criteria. 

The context of the pandemic has been particularly productive in terms of record- 
ing new expressions. A brief exploration of the DHLE revealed that some of these ex- 
pressions are related to the virus, but most of them have to do with COVID-19, with 
its prevention and treatment, and with concepts that are only understood in the new 
circumstances. Some of these expressions are scientific in nature, but in other cases 
their inclusion in specialization languages could be debatable. This is the case of 
words such as covidengue ‘covidengue’ and covidfobia ‘covidphobia’, for example. 
Others seem to be humorous or somewhat picturesque productions. This is the case, 
for example, of a significant number of entries beginning with COVI-, such as covicho, 
‘covidbug’, covidcidio ‘covidcide’, covidiota ‘covidiot’, and covidemia ‘covidemia’. 

With the aim of examining in greater detail a sample of expressions in vogue 
during the pandemic, which came to the attention of the DHLE and the DLE, as well 
as assessing their potential novelty, we chose to study a homogeneous universe: 
that of the words that emerged as derivatives or developments of CORONA-. This will 
be explained in the following sections. 


4 The case of words beginning with CORONA- 
4.1 Research design 


As noted in the introduction, this section will present a concrete analysis whose 
universe is formed by the expressions beginning with CORONA-, provided they relate 
in some way to the Coronavirus. In considering only the terms that begin with co- 
RONA-, the study leaves out many other words related to the pandemic, such as, for 
example, words beginning with covi-, or words that are part of the set of words re- 
lated to the prevention of the disease, such as distancia social ‘social distance”, 
mascarilla ‘face mask” or sanitización ‘sanitization’. In this sense, the universe stud- 
ied constitutes a partial sample with respect to the total of words or meanings that 
can refer to the pandemic in the dictionaries studied. 

The corpus includes: (i) the 23'* printed edition of the DLE, also known as the 
Tercentenary edition (DLE 2014); (ii) the 23.4 DLE version, which is currently online 
(DLE 23.4 2020); and (iii) the DHLE version available as of the date of this study, 
May 2021 (DHLE 2013-). 
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While the three works that make up the corpus are not completely homoge- 
neous, neither in their structure nor in their objectives, for the purposes of this 
analysis the findings are comparable and allow for general and specific conclusions 
to be drawn regarding the recording status of neologisms, or neologism candidates, 
in the Spanish language in the pandemic scenario that unfolded as of 2019. 

It should be noted that equivalents and variants, which are moreover only in- 
cluded in the DHLE, are recorded together or separately following the criterion used in 
the dictionary itself. In this sense, coronasalmonela and coronasalmonella each have 
their own entry because they are featured separately in the DHLE, but coronadengue 
and corona-dengue are considered as one entry because that is how they are recorded. 


4.2 Research findings 


The data presented below shows, as noted above, the recording status of words be- 
ginning with CORONA- and which are related to the coronavirus, in the three sources 
selected as corpus (DLE 2014; DLE 23.4 2020; DHLE 2013-). 

The thirty-one words identified are coronaplauso ‘coronapplause’, coronabebé ‘co- 
ronababy”, coronabicho ‘coronabug’, coronaboda ‘coronawedding’, coronabono ‘corona- 
bonus”, coronabulo ‘coronahoax’, coronachikunguña ‘coronachikunguya’, coronachivato 
‘coronainformer’, coronacompra ‘coronapurchase’, coronacrisis ‘coronacrisis’, corona- 
dengue, corona-dengue ‘coronadengue’, coronadiccionario ‘coronadictionary’, corona- 
divorcio ‘coronadivorce’, coronafiesta ‘coronaparty’, coronafobia ‘coronaphobia’, 
coronahisteria 'coronahysteria”, coronahisterico,a ‘coronahysteric’, coronalengua ‘co- 
ronalanguage’, coronalenguaje ‘coronalanguage’, coronamania ‘coronamania’, coro- 
nacionalismo ‘coronationalism’, coronapositivo ‘coronapositive’, coronasalmonela 
‘coronasalmonella’, coronasalmonella ‘coronasalmonella’, coronaviral ‘coronaviral’, 
coronavirico, a ‘coronaviral’, coronavirologia ‘coronavirology’, coronavirölogo, a ‘co- 
ronavirologist’, coronavirosis ‘coronavirosis’, coronaviroso, a ‘coronainfected’ and co- 
ronavirus ‘coronavirus’. All thirty-one words that fit the search equation are recorded in 
DHLE, while only two of them are recorded in DLE 23.4 (coronavirico, ca and coronavi- 
rus), and none in DLE 23. Thus, the only two words beginning with CORONA- that are fea- 
tured both in DLE 23.4 and DHLE are coronavirus and coronavírico, ca.'® None of the 
humorous constructions, such as coronaplauso, coronabebe, or coronabicho are re- 
corded in DLE, perhaps in the understanding that their stability is not ensured. 

Four of the thirty-one words included in DHLE were documented before the pan- 
demic (coronavirus, since 1980; coronavirosis, since 1992, coronaviral, since 1997; and 
coronavirología, since 2012). This thus verifies that, while these documented instances 


18 The entries for these two words in both dictionaries are attached as figures 1-4 in annex l. 
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predate 2014 (year in which the DLE 23 edition came out), at that time there were not 
sufficient arguments to include any of those words in that dictionary, or in its 23.4 
online edition. Documented instances may exist prior to 2019 because coronavirus is a 
generic term coined in 1968. In fact, a research study conducted in the United King- 
dom toward the mid-1960s revealed that the first of what are now known as coronavi- 
ruses corresponded to a virus found in chickens suffering from bronchitis, around 
the year 1930 (Tyrrell/Bynoe 1965). SARS-CoV-2 is just one of the viruses in the Coro- 
navirus family. This clarification is necessary to determine which words are neolo- 
gisms or potential neologisms, and which have a long-standing existence, although 
without the exposure they have now. 

Of the thirty-one words recorded in DHLE, only two have a diatechnical label, 
and in both cases, it is medicine (‘Med.’). Coronavirus receives a diatechnical label 
both in the DLE 23.4 and DHLE, while coronaviral has one but only in DHLE, given 
that the word is not recorded in DLE or in DLE 23.4. The DHLE also assigns it a dou- 
ble label: medicine and veterinary. The adjective coronavírico, ca, for its part, has 
the “Med.” label in the 23.4 edition of DLE but not in DHLE. 

There are at least four other expressions, featured only in DHLE, that could be 
considered specialized words that merit a diatechnical label, although they do not 
have one, at least to date: coronavirología, coronavirólogo, coronavirosis, and coro- 
naviroso. These four words have not gone beyond their specialized field and do not 
appear to be used outside it, so that they meet the generally accepted criteria for 
receiving a diatechnical label. 

Coronabono, for its part, is a word clearly connected with the field of econom- 
ics, an area of knowledge that has its own diatechnical label in DLE, and which 
could constitute another case to consider. 

The twenty-seven words in DHLE that have no documented instances prior to 
the pandemic could be considered candidates for full neologisms, taking into ac- 
count also that they were not recorded in the last 23.4 version of DLE. 

None of the dictionaries incorporates any foreign words connected with the 
pandemic. 

In Table 1, the thirty-one words identified are distributed geographically based 
on the data recorded only in DHLE, given that that dictionary is the only one of the 
three sources that features all the words, and also locates each documented in- 
stance geographically. This table also has a chronological reference, as DHLE indi- 
cates the year of the documented use. 

The geographical indications are divided into four large regions: Europe, North 
America, Central America and the Caribbean, and South America. This division, 
however, precludes the drawing of reliable conclusions or trends regarding the 
scope of use. Moreover, as previously noted (Bernal et al. 2020), applying exclu- 
sively a frequency criterion is not enough in the usual practice for updating lan- 
guage dictionaries, so that the geographical data gathered here has illustrative 
rather than descriptive value. 
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Table 1: Geographical and chronological distribution of the documented uses of the thirty-one 
corona- words in DHLE. 


Word Europe North Central South Total 2020 2021 


America America and America (as of 

the Caribbean 30/05) 
Coronaplauso 4 4 2 2 
Coronabebé 6 6 6 
Coronabicho 4 3 7 6 1 
Coronaboda 3 2 5 4 1 
Coronabono 6 2 8 8 
Coronabulo 11 11 9 2 
Coronachikunguña 1 1 1 
Coronachivato 2 2 1 1 
Coronacompra 1 2 1 1 5 4 1 
Coronacrisis 4 6 10 8 2 
Coronadengue 2 5 7 4 3 
Coronadiccionario 1 1 1 
Coronadivorcio 2 2 1 5 5 
Coronafiesta 2 2 2 6 4 2 
Coronafobia 5 1 2 8 6 2 
Coronahisteria 5 1 6 5 1 
Coronahistérico, a 2 2 2 
Coronalengua 1 1 2 2 
Coronalenguaje 1 1 2 2 
Coronamanía 1 1 2 2 
Coronacionalismo 4 4 4 
Coronapositivo, a 4 4 4 
Coronasalmonela 1 1 1 
Coronasalmonella 1 1 1 
Coronavírico, a 7 1 8 6 2 
Coronavirólogo, a 4 1 1 6 4 2 
Coronaviroso, a 1 4 5 4 1 


TOTALS 70 15 8 36 129 103 26 
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Only one word (coronacompra) is recorded in every region, which does not 
mean that it is used in every country of those regions. 

While the criteria for the inclusion/exclusion of neologisms in DHLE are not made 
explicit, an inductive analysis of the recorded instances of the thirty-one words studied 
allows us to conclude that only one documented instance is needed in order to be in- 
cluded in DHLE, as in Table 1 there are four cases with a single documented instance 
(coronachikunguña, coronadiccionario, coronasalmonela, and coronasalmonella). 

The maximum number of documented instances is eleven, and there is only one 
word with that many instances. This may reflect the very recent emergence of all of 
these words, in line with the unfolding of the pandemic. Moreover, of the 129 docu- 
mented instances in total, 103 are from the year 2020 (80 percent) and 26 (20 percent) 
from the year 2021, although the analysis for 2021 only goes up to May 30. 

While we understand the logic that is intended to be established for the pro- 
gressive construction of DHLE, as far as neologism candidates are concerned, with 
this repertoire, in addition to acting as a historical dictionary, it also becomes, in 
practice, a refined emergency dictionary? that gathers new words - even some 
whose validation could be questionable — that in the future could either be poten- 
tial lemmas of DLE or discarded outright for not meeting the usual criteria for incor- 
poration into the main dictionary of the Spanish language. 


5 Final considerations 


A pandemic of the magnitude of the one that broke out in 2019, and with the speed 
at which it has spread around the world, is unusual. With the virus, the disease, 
and the social and economic changes brought on by the pandemic, words and 
terms also spread very rapidly, quickly capturing the attention of linguists and lex- 
icographers. The academic dictionaries of the Spanish language are facing this situ- 
ation in an unprecedented technological and methodological scenario for Spanish 
academic lexicography, which allows for an aggiornamento never seen before with 
respect to the updating practices both for DLE and DHLE. Two absolutely excep- 
tional situations thus converged: the pandemic and a modern updating system. 

At the same time, as the possibility of updating the DLE coincides with a mo- 
mentum in the DHLE, the academic outlook is even more unique. DHLE operates 


19 This term stands for the Spanish phrase “diccionario de emergencia” or “diccionario de urgen- 
cia” (emergency or urgency dictionary), as it is used on websites such as http://www.intranet.sen 
asa.gov.ar/intranet/imagenes/archivos/prensa/caja herramientas/Diccionario de Urgencia.pdf or 
https://www.meneame.net/m/actualidad/mami-guillao-masacote-diccionario-urgencia-descifrar- 
canciones or https://www.consumer.es/economia-domestica/finanzas/diccionario-de-urgencia- 
para-entender-que-ocurre-con-los-bancos-espanoles.html (last access: 10 June 2022). 
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methodologically as an emergency dictionary; it does not abandon its role as histor- 
ical dictionary, but it acts also as an emergency dictionary, while DLE functions as 
an exclusion corpus. DHLE plays the role of witness corpus, in which all lexical cre- 
ations are recorded. By recording neologisms it is writing the history of these 
words, bearing witness to terms that may or may not remain in the language, but 
whose appearance and disappearance can be dated and documented. DLE will in- 
corporate only some of these words added by DHLE. It will incorporate only those 
that have stability, that can still be observed after a reasonable period, that have 
resisted the passage of time. All the words that remain in DHLE and are not incorpo- 
rated into DLE are in quarantine, or, more precisely, in limbo, because it is a quar- 
antine that is not necessarily going to end. They may remain there for an indefinite 
period of time, unless usage determines otherwise. 

In this particular aspect, DHLE can be compared to the Diccionario manual e ilus- 
trado de la lengua académica (1989) (Manual and Illustrated Academic Language Dic- 
tionary),”° in that it distinguishes, as Alvar noted (1992), the neologisms-general/usual 
language dictionary relationship from the neologisms-manual dictionary relationship. 
Alvar understands that the manual dictionary gathers new words “aware that they 
could be a vocabulary that will have a fleeting existence in the general language.” And 
he adds, “this is a necessary process: these words may disappear without leaving any 
other trace than the ephemeral presence of a limited use, but they may become wide- 
spread in their use and this non-normative repertoire will have been the anteroom for 
accessing the Diccionario usual [DLE]” (Alvar 1992). The idea of an “anteroom” can 
also be applied to DHLE. 

Moreover, it should be noted that in DLE there is a reluctance to incorporate 
words, because there is a clear awareness of the difficulty involved in removing a 
word from the dictionary once it is included. In contrast, DHLE does not face a hori- 
zon in which it will be necessary to discuss whether or not a word is removed from 
the dictionary: it only needs to document the year of its last recorded use. 

DLE aims, as it should, to be increasingly more user-friendly for native speak- 
ers, who, incidentally, are its intended audience; DHLE, by dating the first docu- 
mented instance of the word in question, becomes a very user-friendly dictionary 
for researchers. Researchers often find it difficult to see in DLE the principles and 
criteria that govern the incorporations into the dictionary. These principles and cri- 
teria are not made explicit with the 2020 incorporations either, but it is clear that it 
has adapted to the emergence of the pandemic and that it took advantage of the 
technological resources available, which are, moreover, part of its new institutional 
policies. 

The analysis of the nature of the new pandemic-specific expressions reveals 
that only two words in DLE (coronavirus and coronavirico, ca) are, strictly speaking, 


20 Cf. also Diccionario esencial de la lengua española (2006). 
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specialized terms: they receive a modern lexicographic treatment, which does not 
pose past difficulties when it comes to defining specialized language words. 

These two words are also recorded in DHLE, but only coronavirus has the label 
‘Med.’ As noted, other words beginning with CORONA- could be assigned that label too. 

The new additions open a debate on the treatment of neologisms in lexicogra- 
phy, in a particularly unique scenario. It could be said that the changes made in 
DLE - as a result of the coronavirus pandemic — in a way rekindle old discussions 
regarding the criteria and methods used by DLE to select, incorporate, and define 
expressions belonging to specialized areas. 

The pandemic represents an opportunity for lexicography and terminology re- 
searchers to discuss and propose consistent solutions for the incorporation of scien- 
tific and specialized words into DLE and other Spanish dictionaries. In this regard, 
it offers a chance to leave behind vague criteria for incorporating or excluding sci- 
entific terms, scientific definitions not easily understood by a regular audience, 
conceptual inaccuracies, and somewhat erratic assignments of thematic labels, 
among other criticisms that DLE has received. 


Annex | 
Diccionario histórico de la lengua española METE 
INDITEX 
Escriba aquí la palabra O. 
aeióuun 
10.* Entrega (marzo de 2021) 

"or? Versión del 31/03/2021 
coronavirico, a adj. (2020-) Mostrando ordenación cronológica * 
coronavírico Familia «47 


Etim. Derivado de coronavirus e “ico, a. 


Se documenta por primera vez, con la acepción 'perteneciente o relativo al coronavirus o a las circunstancias y 
la época de la pandemia de coronavirus', desde febrero de 2020, en las primeras noticias publicadas en la 
prensa sobre las consecuencias que la expansión del virus estaba provocando en Europa, como se refleja en El 
Confidencial (Madrid). Como '[persona] que tiene coronavirus' se registra en el artículo "¿Coronavírico o 


coronaviroso?: la Fundéu opina ante el Covid-19" publicado en Redacción Médica (Madrid), en el que se 
muestra la vacilación entre coronavírico y coronaviroso. Se consigna en la versión 23.4. del Diccionario de la 


Mostrar resumen completo Ocultar resumen * 


1: adj. Perteneciente o relativo al coronavirus. > coronavirus + -ico,a 


Sinónimos: coronaviral; coronaviroso, a 


docs. (2020-2021) 8 z:zurzos: 


Figure 1: https: //www.rae.es/dhle/coronavírico. 
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Diccionario histórico de la lengua española Con el apoyo de | 
INDITEX 
Escriba aqui la palabra O. 
áéióüüh 
10.* Entrega (marzo de 2021) 
: Versión del 31/03/2021 
coronavirus s. (1980-) Mostrando ordenación cronológica v 
coronavirus Familia Sfr) 


Etim. Voz tomada del inglés coronavirus, atestiguada en esta lengua al menos desde 1968, en un artículo 
científico publicado en Nature (véase OED, s. v.). 


Se documenta por primera vez, con la acepción 'virus de la familia Coronaviridae, compuesto por un núcleo de 
ARN y cubierto por una corona de glucoproteínas, que causa enfermedades respiratorias e intestinales en 
personas y animales”, en 1980, en la Guía de enfermedades de los cerdos de J. A. Chipper. Se consigna por 
primera vez en el Vocabulario cientifico y técnico (1983) de la Real Academia de Ciencias Exactas, Físicas y 


Naturales. Por metonimia, pasa también a denominarse coronavirus la enfermedad que provoca, como muestra, 
en 1997, un anuncio publicado en Nuevo Heraldo (Aranjuez), donde se emplea con el valor 'enfermedad 


Mostrar resumen completo Ocultar resumen * 


1. s mı Microb. Med Virus de la familia Coronaviridae , compuesto por un núcleo de ARN y cubierto por ac. etim. 
una corona de glucoproteínas, que causa enfermedades respiratorias e intestinales en personas y animales. 


docs. (1980-2021) 25 z:zsmos: 


Figure 2: https: //www.rae.es/dhle/coronavirus. 


Ed 
REAL ACADEMIA ESPANOLA 4 
Diccionario de la lengua espanola Ediciön del Tricentenario 
Consulta posible gracias al compromiso con la cultura de la K Fundación "la Caixa” 


coronavirus 


Del ingl. coronavirus, de corona ‘corona solar’, por el aspecto del virus al microscopio, y este del lat. 
coróna ‘corona’, y virus 'virus', y este del lat. virus 'veneno', 'ponzoña". 


1. m. Med. Virus que produce diversas enfermedades respiratorias en los seres humanos, desde 
el catarro a la neumonía o la COVID. 


Figure 3: https://dle.rae.es/coronavirus. 
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À Ed 
REAL ACADEMIA ESPANOLA 4 
Diccionario de la lengua espanola Ediciön del Tricentenario 
Consulta posible gracias al compromiso con la cultura de la K Fundación ”la Caixa” 


| por palabras Jf | E) 


coronavírico, ca 


1. adj. Med. Perteneciente o relativo al coronavirus. 


Figure 4: https: //dle.rae.es/coronavírico. 
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Judit Papp 

How the COVID-19 pandemic is changing 

the Hungarian language: Building 

a domain-specific Hungarian/Italian/ 
English dictionary of the COVID-19 pandemic 


1 Introduction 


This paper presents the main issues connected with the creation of a trilingual 
Hungarian-Italian-English dictionary of the COVID-19 pandemic using Lexonomy.! 
My aim is not only to create a coronacorpus (in Hungarian, I propose my own co- 
rona-neologism or ‘coroneologism’:* koronakorpusz) and a dictionary of equiva- 
lents, but also to understand how the different waves and phases of the COVID-19 
pandemic are changing the Hungarian language, detect the Corona-, COVID-, pan- 
demic-, virus-, mask-, quarantine-, and vaccine-related neologisms, and offer an 
overview of the most frequent or linguistically interesting Hungarian neologisms 
and multiword units related to COVID-19. 

For the creation of the Hungarian/Italian/English dictionary of the COVID-19 
pandemic (hereinafter referred to as the Trilingual (HU, IT, EN) COVID-19 Dictionary, 
TCD), I used a specialized coronacorpus extracted from the Web using Sketch En- 
gine. To detect the related terms, I also analyze the Hungarian web corpora of news 
articles (online press) obtained from crawling a list of RSS feeds (Timestamped JSI 
web corpus).* It is already highly evident that the vocabulary used in these articles 
(in the printed versions as well as in online press and media) is rather different with 
respect to the past. In fact, it is possible to note a frequency increase (for a short 
period, such as from March to the end of May 2020, or for a longer period, such as 
from March to the end of 2020) for certain word forms that are to some extent related 
to the all-encompassing COVID-19 pandemic. It is also possible to discover word 
forms that, before the outbreak of the pandemic, have never been seen in everyday 


1 https://www.lexonomy.eu/p8mwspck/ (last access: 10 June 2022). 

2 The COVID-19 inspired neologisms or ‘coronacoinages’ are sometimes referred to also as ‘coroneo- 
logism’, e.g., in papers written by Roig-Marín (2020). Previously, the term ‘coroneologism’ appeared 
in newspaper articles (e.g. Coroneologisms are going viral. In: Economic Times. April 9, 2020). 

3 https: //www.sketchengine.eu/ (last access: 10 June 2022). 

4 https://www.sketchengine.eu/jozef-stefan-institute-newsfeed-corpus/ (last access: 10 June 2022). 
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Hungarian language. Many terms that usually belong to the medical and scientific 
fields (epidemiology, virology, serology, etc.) are being used in everyday language (in 
the press but also in informal contexts). 

For the domain-specific terminology extraction, I used the Oneclick Dictionary 
function of Sketch Engine and created the first drafts of TCD. 

From the dictionary drafts, I extracted the headwords related to the pandemic and 
included them in the TCD. I customized the structure and formatting of the dictionary 
in Lexonomy as well as configured the connection with my Sketch Engine account to 
have the possibility to extract and pull example sentences from Sketch Engine. 

Finally, I completed the entries with the Italian and English equivalents and 
the corresponding examples taken from the Web as well as from the corresponding 
Timestamped JSI web corpora. 


2 Field of study 


Studies and research dedicated to the methodical lexicographic treatment of Hun- 
garian terms related to the COVID-19 pandemic are still rather uncommon. The ex- 
isting glossaries or dictionaries are usually monolingual (Hungarian) or bilingual 
(Hungarian-English). Except for my own Trilingual (HU, IT, EN) COVID-19 Dictionary 
(TCD), there is no other existing Hungarian-Italian dictionary on COVID-19, 

Following the first wave of the pandemic, in 2020, a dictionary on the lexicon 
of COVID-19 was published in Hungary by Ágnes Veszelszki, the Karanténszótár, 
which collects 400 neologisms (words and expressions) that have appeared in the 
Hungarian language between January and July 3, 2020. Each lemma is accompa- 
nied by an explanation and examples taken from real texts. Besides the most com- 
monly used words and expressions, Veszelszki has also included rather rare forms 
as well as hapax legomena in her dictionary. The dictionary, accompanied by a 
short essay, is an authentic and important testimony of the period under review, as 
it offers users a detailed view of the Hungarian linguistic aspects of the COVID-19 
pandemic. In addition, it constitutes a valuable source for further linguistic reflec- 
tions on the formation of neologisms in the Hungarian context. The essay is also 
interesting for the lexicon used by Veszelszki, as it is profoundly influenced by the 
pandemic and has its own related neologisms, namely karanténszótár “dictionary 
on quarantine”, karanténszókincs “lexicon on quarantine”, kórlenyomat “imprint of 
the disease”, and karanténkor “period of quarantine”. These neologisms appear ei- 
ther in the title, introduction, or essay of the publication, but the author does not 
lemmatize nor define them among the items collected. 

Another noteworthy publication on the challenges posed by COVID-19 and the 
various responses to the pandemic is Globális kihivás — lokális válaszok (Global chal- 
lenge - local responses) edited by László Koväcs (2020), which includes a section 
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dedicated to the articles that reflect on the new phenomena in the Hungarian lan- 
guage (Balázs; Domonkosi-Ludányi; Kegyes- Lanzmaier-Ugri and Lénárt). 

With the creation of the TCD, my aim has been to fill this lexicographic gap primar- 
ily concerning the Hungarian-Italian language pair and to organize this content in a 
free online tool (a rich database) that is easy to search and useful for linguists and 
translators. The dictionary created with Lexonomy makes it possible to store, maintain, 
and update data in an organized manner. The third language is English, as the compar- 
ison with it is inevitable. On the one hand, that is due to the enormous quantity of 
news produced and conveyed, with extraordinary speed, by international agencies, a 
phenomenon exerts a considerable influence on other languages and, on the other 
hand, because English is the language of the international scientific community includ- 
ing, therefore, international medical research. The papers, findings, and results of sci- 
entists’ experiments relating to COVID-19 are published in English, which means that 
English plays an important role in the creation of neologisms. In both Hungarian and 
Italian, we record a certain number of loans, calques, and adaptations, but we also 
have to deal with the needs of ordinary people and the creative abilities of individual 
languages. 


3 Methodology 
3.1 Corpus Selection 


The COVID-19 Open Research Dataset (CORD-19) that is available on Sketch Engine 
consists of a collection of texts in English. As of November 2021, I still cannot find 
any specific Hungarian or Italian COVID-19 related corpus or a Hungarian-Italian 
COVID-19 related dictionary. 

Therefore, for creating TCD, I decided to build my own COVID-19 related Hun- 
garian corpus (in Hungarian koronakorpusz) using Sketch Engine and starting from 
the Web, as it represents an enormous resource (“web as corpus”, cf. Kilgarriff 2001, 
Kilgarriff-Greffenstette 2003). The Hungarian coroneologisms and words related to 
the COVID-19 pandemic are detected in this specific corpus that is built using all 
three options of Sketch Engine that make it possible to make the corpus larger: 

- content downloaded by providing some typical words defining the topic (seed 
words), 

- content downloaded by providing a list of URLs for download, 

- content downloaded by downloading a complete website. 
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Content downloaded by providing the typical words that define the topic 
(seed words) 
(“Find texts on the Web” option, input type: “Web search”) 


As a result of this option, Sketch Engine extracted a series of web pages and docu- 
ments. In “Web search”, I input groups of words and phrases (maximum 20) to en- 
able defining the topic of the new corpus. With the pandemic's progression and the 
succession of the different phases and waves, among others, I used seed words such 
as: Astrazeneca, átoltottság ‘vaccination coverage rate”, COVID, COVID-19, COVID- 
igazolvány ‘COVID certificate’, delta, deltavirus “delta variant, digitális “digital”, fertöt- 
lenites ‘disinfection’, fertözes ‘contagion, infection”, fertözött ‘infected’, görbe ‘curve’, 
harmadik ‘third’, hullam “wave”, immunitás “immunity”, járvány “epidemic”, karantén 
‘quarantine’, koronavirus ‘coronavirus’, Moderna, mutáció ‘mutation’, oltás ‘vaccina- 
tion’, oltásellenes ‘anti-vax’, oltáspárti ‘pro-vax’, oltópont ‘vaccination point’, Pfizer, 
Sputnik, szájmaszk ‘mask’, távolságtartás ‘social distancing’, tömeges ‘massive’, vakcina 
‘vaccine’, vakcinabeszerzés ‘vaccine procurement’, védettség ‘immunity’, védóoltás 
‘vaccine’, vírusvariáns ‘virus variant’, etc. 

An advantage of Sketch Engine is making it possible to run the corpus building 
tool many times to make it increasingly larger. It is also possible to repeat the search 
with the same seed words multiple times but also with different seeds, as well as to 
have multiword expressions using the quotes or proper names of different kinds. These 
words (seeds) are randomly selected and groups of three are sent to the Bing search 
engine. The Web pages that Bing returns are downloaded and processed into a corpus. 


Content downloaded by providing a list of URLs that should be downloaded 

I have also collected Hungarian language data from relevant URLs (e.g. blogs, fo- 
rums, general websites on COVID-19, etc.). The main criterion for inclusion in the 
corpus is texts dealing with topics related to the pandemic. 


Content downloaded by downloading a complete website 
In particular, I have downloaded a few websites (July 12, 2021) containing useful 
information on the topic. 


5 The downloaded websites are: (i) https://koronavirus.gov.hu/, the official governmental portal in Hun- 
garian on COVID-19 created together with the Operational Force responsible for the Prevention of the 
COVID-19 pandemic (Koronavírus-járvány Elleni Védekezésért Felelós Operativ Törzs) on January 31, 2020. 
The Ministry of Interior of Hungary is responsible for the operation of the portal and the Prime Minister's 
Office is responsible for editing the content; (ii) https://www.covid1001.hu/: in the middle of March 2020, 
a group of medical translators (specialists, biologists, pharmacists, epidemiologists, language specialists) 
decided to combat misinformation by translating and publishing reputable articles; (iii) https://semmel 
weis.hu/koronavirus/, Semmelweis University's website on the novel coronavirus, which is constantly 
updated with the latest news, information, communications, instructions, and actions concerning univer- 
sity citizens; (iv) https://www.elte.hu/content/koronavirussal-kapcsolatos-tajekoztatok-cikkek.c2c.316: 
Eötvös Löränd University’s website that contains information on COVID-19 (updated: June 30, 2021); 
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With the help of its corpus building tools, Sketch Engine processed many Web 
pages and documents and built the Hungarian ‘coronacorpus’ (about 4 million words). 

For the domain-specific terminology extraction, I used the Oneclick Dictionary 
function of Sketch Engine and created the first drafts of TCD. The Oneclick Dictio- 
nary is useful in automating the exchange of lexicographic data between the se- 
lected Sketch Engine corpus and a Lexonomy dictionary (Méchura 2017), even if 
post-editing is required. Besides my own specialized corpus, I analyze the following 
Hungarian web corpora of a news articles obtained from crawling a list of RSS 
feeds: Timestamped JSI web corpus 2014-2020 Hungarian and Timestamped JSI web 
corpus 2021-01 Hungarian. “The JSI Newsfeed corpus is a new family of Web cor- 
pora created from the JSI newsfeed of Jozef Stefan Institute, Slovenia [. . .]. JSI 
newsfeed is a clean, continuous, real-time aggregated stream of semantically en- 
riched news articles from RSS-enabled sites across the world.” (Busta et al. 2017). 
The corpora are tagged by TreeTagger v2. 

Concerning Timestamped JSI web corpus 2014-2020 Hungarian, 1 have also cre- 
ated a sub-corpus that contains only articles from 2020, including 309,663,951 tokens 
and 256,156,393 words. The Timestamped JSI web corpus 2021-01 Hungarian contains 
113,132 documents, including 34,378,246 tokens, 28,376,390 words, 1,624,519 senten- 
ces, and 699,713 paragraphs. Although these corpora are obviously not exhaustive, 
given these figures and the wide coverage of Hungarian language sources, I conclude 
that the size of the corpora can be suitable for analyzing the phenomena and trends 
in the Hungarian online press. 

While my coronacorpus is useful in detecting the Hungarian coroneologisms (ac- 
cepted by the speech community) and occasionalisms (or ‘nonce words” coined for 
a particular occasion, e.g. aranymaszk ‘gold mask’) used not only in newspaper ar- 
ticles and standard Hungarian texts (everyday, neutral, unmarked) but also on 
other websites (government, homepages, school/university, etc.), blogs, and social 
networks (Facebook, Instagram, etc.). In this three-way, colloquial language (slang, 
informal, familiar) and formal language (scientific, specialized, academic, literary) 
will also be represented in TCD. 

The Hungarian Timestamped JSI web corpora are an outstanding tool to detect the 
behavior of the words or single word forms. “Trends”, in fact, is a feature of Sketch En- 
gine “for detecting words that undergo changes in the frequency of use in time (dia- 
chronic analysis). Trends identify words whose use increases or decreases in time.”° 

Alongside this feature, ‘Concordance’ is useful, mostly the Distribution of hits in 
the corpus” function provides highly informative results. The “Word Sketch’ option, a 


(v) https: //europa2000.hu/covid-19/. The COVID-19 section of the website operated by the Europa 
2000 Secondary School (Budapest), which is a secondary grammar school and vocational institution 
maintained by a foundation; and (vi) https://www.pfizer.hu/, the Hungarian version of the institu- 
tional site of Pfizer, one of the world’s premier innovative biopharmaceutical companies. 

6 3https://www.sketchengine.eu/guide/trends/#toggle-id-6. 
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one-page summary of the word's grammatical and collocational behavior, is another 
helpful feature (active in my own corpus, not available in the Timestamped corpora).” 

A good example for illustrating the “Trend” feature is the Hungarian coroneolo- 
gism nyunyóka. The explosive growth of its frequency is strictly related to the pan- 
demic. A nyunyóka can be anything that is safe for a baby or toddler to have at 
sleep time. It is a sort of comfort or transitional item — a blanket or stuffed animal 
or other comfort object of affection that a baby or toddler brings to bed, and that 
provides comfort and soothing. Previously, the term nyunyóka was uncommon and 
was used only in baby talk, and then, due to the massive media impact of Chief 
Medical Officer Cecília Müller’s discourse during a press conference of the Opera- 
tional Force, concerning personal hygiene habits to teach kids and the necessity to 
wash comfort objects frequently, this neologism entered the common language and 
became widely known and used. The number of hits found in the corpus is 166, for 
a lemma present only since May 13, 2020.* Müller shared these tips instead of the 
daily COVID numbers, mortality and recovery rates, current active cases, recoveries, 
etc. that people were actually expecting. The results of the search query using Goo- 
gle now list 46,500 pages (as of December 16, 2021). 

Besides the Timestamped JSI web corpora, the Web is a valuable corpus to find 
coroneologisms and forms belonging to Hungarian slang or to the colloquial regis- 
ter. The latter forms are usually not represented in current corpora typically based 
on news articles, which is why the creation of the Hungarian coronacorpus is impor- 
tant for this research. 

From these drafts, I extracted the headwords related to the pandemic and in- 
cluded them in TCD. I customized the structure and formatting of the dictionary in 
Lexonomy and configured the connection with my Sketch Engine account so that 
there is an option to extract and pull example sentences from it. This option allows 
you to detect, select, and pull not only definitions and descriptions of the Hungarian 
coroneologisms (new words, new meanings of existing words, and new multiword 
units) into Lexonomy, but also collocates and collocations, etc. While building the 
dictionary, particular attention is paid to neologisms related to aspects regarding the 
outbreak of the pandemic, lockdowns, curfews, quarantines, social distancing, good 
hygiene practices, epidemiological curves, smart working, distance learning, first 


7 However, in “Show visualization”, it would be great if the image could be editable by the user. 

8 From Miiller's discourse (https://index.indavideo.hu/video/Csenjuk el a gyermek nyunyokajat): 
“Tudjuk jól, hogy a piciknél van valamiféle ragaszkodás: itt nemcsak a cumikra gondolok, hanem kis 
pelenkára, vagy nyunyökära, amit 6 otthonról hoz és nagyon szereti. Próbáljuk meg ezeket otthon 
gyakran tisztitani, elcsenni ameddig alszik a gyermek és ezeket kimosni és vasalással még egy höke- 
zelésnek alávetni, ami szintén fertótlenitó hatású.” (We know very well that little babies have some 
kind of attachment. Here, I am thinking not only of the pacifiers, but also of the little diaper or any 
comfort object he or she brings from home (to the nursery) and loves it very much. Let's try to clean 
them frequently at home, sneak them away from the child while they are asleep and wash them and 
subject them to heat treatment with ironing, which also has a disinfectant effect.) 
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wave, second wave, burden on healthcare systems, vaccines, and vaccine efficacy, 
third wave, fourth wave, variants, green pass, and the EU digital COVID-19 certificate. 
These aspects are where the largest part of new words came into existence. Common 
Hungarian terms that are important for understanding the COVID-19 pandemic are 
also included in the dictionary. 

As far as the Italian and English equivalents are concerned, I proceed with in- 
terrogating the available Timestamped JSI web corpora for Italian and English and 
the above-mentioned COVID-19 Open Research Dataset (CORD-19): Timestamped JSI 
web corpus 2014-2020 Italian, Timestamped JSI web corpus 2021-01 Italian, Time- 
stamped JSI web corpus 2014-2020 English, Timestamped JSI web corpus 2021-01 
English and the COVID-19 Open Research Dataset (CORD-19). 


3.2 Terminological extraction 


To be able to extract more and more COVID-19-related terms with my COVID-19- 
related Hungarian corpus and the Timestamped JSI web corpora, I used the “Key- 
words” function (terminology extraction) that is available on Sketch Engine, down- 
loaded and analyzed the ‘Wordlist’ (frequency list), and used the ‘Concordance’ 
function. In particular, in ‘Wordlist’ (BASIC tab), I searched for certain strings, such 
as COVID, korona, karan (from karantén ‘quarantine’), járvány (‘pandemic’), vírus, fert 
(from fertóz ‘infect’), beteg (‘ill’), véd (‘protect’), olt (to vaccinate’), vakcina (‘vaccine’), 
and immun to get the productivity of the corresponding lemmas. 


3.3 Draft dictionary and formatting 


Again, with my own COVID-19 related Hungarian corpus, I used the One-Click dic- 
tionary (automatic dictionary drafting) function of Sketch Engine to create my draft 
dictionary for Hungarian. The result was useful. From the draft, I extracted many 
headwords related to the pandemic. However, after a while, I learned how to use 
Lexonomy as well as how to configure and customize the dictionary structure and 
formatting, then I preferred to create a new, empty dictionary using the ‘Create a 
dictionary’ option and insert data manually, one by one. This method is time- 
consuming, but the content is more professional. I have also configured the connec- 
tion with my Sketch Engine account to connect TCD with one of the corpora and 
implement the information available for the single terms or expressions. 
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3.4 The structure of TCD 


Headwords consist not only of single words, but they also include particularly fre- 
quent or relevant multiword expressions (MWEs). Less frequent MWEs are pre- 
sented as collocations of the headword or among the examples. 

TCD is linked to the original corpus in Sketch Engine, and it is possible to de- 
tect, select, extract, and automatically pull definitions, examples of usage, colloca- 
tions, and thesaurus items of the Hungarian coroneologisms from my corpus into 
Lexonomy. 

Common Hungarian terms that are important for understanding the COVID-19 
pandemic are also included in the dictionary, such as járvány ‘epidemic’, virus 
“virus”, fertözött ‘infected’, etc. 

The Italian and English equivalents are added manually along with useful ex- 
amples taken from texts on the Web. 


4 First results 


The use of several pre-existing occasional words and expressions has increased sig- 
nificantly during the COVID-19 period, while neologisms linked to the pandemic 
were coined with surprising speed (e.g. covidiot, coronababy, zoom-kocsma ‘virtual 
pub in Zoom’, fotelvirológus ‘armchair virologist’, etc.). 

The lexical innovation resulting from the explosion of the pandemic is incom- 
parable, as terms inspired and/or linked to COVID-19 entered the large-scale public 
consciousness. Faced with the new reality, the neologisms represent a functional 
tool to discuss all of the different phenomena related to the pandemic: the impact 
that the pandemic and the crisis have on our lives, society, and economy, the expe- 
riences following restrictive lockdown measures, and the many themes related to 
distance learning or vaccines. They are also useful for expressing our feelings or 
making light of our experiences. 

While, on the one hand, words and expressions that have dominated the pan- 
demic-related discourse since the outbreak of the pandemic have an informative 
function, on the other hand, in a certain sense they also allow us to gain mutual 
understanding, to protect each other, to share warnings, to comment on events, to 
express and share with others anxieties, fears, worries, anger or exasperation. With 
their help, we can also make jokes, laugh, or make fun of this shared lexicon or even 
rid ourselves of fears. For this purpose, my COVID-19 related Hungarian corpus can be 
useful, as it contains language data also from social media sites, forums, and blogs. It 
such texts we register some of the most common Hungarian COVID-19-related words 
with negative connotations, such as COVIDszopás ‘annoyance/unpleasant situation 
due to COVID’ (szopás means ‘sucking’), COVID-tálibok ‘COVID-Talibans’ (also 
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karanténtálibok ‘quarantine Talibans’), COVID-fasizmus ‘COVID fascism’, COVID- 
faszság or kovidfaszság ‘COVID bullshit’, COVIDgeci ‘unpleasant situation due to 
COVID”, etc. 


The outbreak of the emergency in Hungary 
and its maszk-related “viral” lexicon 


The second COVID-19 wave in Hungary began already in August 2020, and the in- 
fections were increasing exponentially. During this second wave, the emphasis was 
on the importance of wearing face masks (maszkviselés). Before the pandemic, the 
word maszk ‘mask’ is present in the corpus in minimal proportions and with a dif- 
ferent meaning (‘a covering for the face that hides the person wearing it”), such as 
in the following example: “The robbers wore masks to hide their identities.” After 
the outbreak of the pandemic, due to the mandatory wearing of face masks, the use 
of maszk became widespread and its productivity exploded. With respect to maszk 
‘mask’, szájmaszk ‘mouth mask”, and arcmaszk ‘face mask’, the word arcpajzs ‘face 
shield” had no success and it did not spread in the common language (it registered 
only 478 hits in 2020). Among these words, the most frequent is maszk ‘mask’ (in- 
flected forms included) with its 576,731 hits in the corpus. The following paragraphs 
show the high productivity of the term maszk and the frequency of the correspond- 
ing mask-related neologisms. 

A group of these neologisms denotes different types of masks (maszktípus 
“mask type” with 44 hits) according to (i) the area covered by the mask, (ii) its func- 
tions, (iii) the materials it consists of, etc.: 

(i) szájmaszk “mouth mask’ (9,774);? arcmaszk ‘face mask’ (3,110); orr-szäjmaszk 
‘nose-mouth mask’ (20); orrmaszk ‘nose mask’ (5). 

(ii) védómaszk ‘protective mask’ (3,605); oxigénmaszk ‘oxygen mask’ (152); sebész- 
maszk ‘surgical mask’ (100); légzómaszk ‘respirator, breath mask’ (19); virus- 
maszk ‘virus mask’ (10); mútósmaszk “surgical mask’ (5). 

(iii) textilmaszk ‘textile mask’ (413); szóvetmaszk ‘mask made with fabric’ (142); FFP- 
maszk ‘FFP mask’ (23); pamutmaszk ‘cotton mask’ (19); pleximaszk ‘plexiglass 
mask’ (18); vaszonmaszk ‘canvas mask’ (17); FFP3-maszk ‘FFP3 mask’ (5). 

(iv) csodamaszk ‘miraculous mask’ (5). 


A second group concerns the act of wearing the mask over the nose, mouth, and 
chin. The noun maszkviselés ‘wearing a mask’ (9,913) refers to the act of wearing a 
mask such as in A maszkviselés kótelezó marad kiiltéren is. ‘Wearing (face) masks 
remains mandatory also outdoors’. In the corpus, there are different synonyms and 


9 The numbers between brackets indicate the number of hits registered in 2020. 
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variants: maszkhasználat ‘usage of masks’ (1,805) and szäjmaszkhasznälat ‘usage of 
mouth masks’ (7); maszkhordás ‘wearing of masks’ (338); maszkviselet ‘wearing of 
masks’ (286) and szájmaszkviselet ‘wearing of mouth masks’ (5); szájmaszkviselés 
‘wearing of mouth masks’ (71); arcmaszkviselés ‘wearing of face masks’ (8). To 
these abstract nouns we can add the derivational suffix -i to create adjectives: 
maszkviselési (1246) and maszkviselési- (4) ‘mask wearing’. An example is maszkvi- 
selési szabályok ‘mask wearing rules’. Other variants are maszkhasználati ‘mask 
usage’ (61), such as in maszkhasználati szabályok ‘mask usage rules’ or maszkhor- 
dási ‘mask wearing’ (42), cf. maszkhordási fegyelem ‘mask wearing discipline”; száj- 
maszkviselési “mouth mask wearing” (11). More complex neologisms are the abstract 
noun maszkviselés-ellenesség “anti-mask wearing” (2) and the adjective maszkviselé- 
ses ‘mask wearing’ (2), such as in maszkviseléses élet ‘mask wearing life”. 

A person wearing a mask is maszkos ‘masked’ (1,272); szájmaszkos ‘masked 
with mouth mask’ (137); vedömaszkos ‘masked with protective mask’ (35); arcmasz- 
kos ‘face masked’ (22); maszkos-kesztyús ‘masked and gloved’ (12). 

The Hungarian word maszkviseló (34, present participle) can be used as an ad- 
jective or as a noun, cf. Jómagam maszkviseló állampolgár vagyok ‘I am a mask 
wearing citizen’ or Tudatos maszkviseló vagyok ‘I am a conscious mask wearer’. 

The person who is not wearing a mask is maszktalan (adj) ‘without a mask’ 
(20), where -talan is a privative suffix or maszknemviseló ‘person not wearing a 
mask’ (4). It is possible to add to the adjective maszktalan another derivational suf- 
fix to create the corresponding abstract noun maszktalanság “the condition of wear- 
ing no mask’ (2). The two compounds maszknélkiiliség (4) and maszk-nem-viselés 
‘non-mask-wearing’ (1) have the same meaning. The last word can function also as 
a base for another adjective, maszknemviselési (adj) ‘not wearing masks’ (4), cf. 
maszknemviselési vita ‘debate around not wearing masks’. 

An adverbial derivational suffix may be added to the adjective maszkos ‘masked’ 
as well: maszkosan “in mask; wearing a mask’ (21). E.g.: Az viszont határozottan jo, 
hogy a diákok maszkosan nem tudnak cigarettázni! ‘On the other hand, it is definitely 
good that students cannot smoke when wearing masks!’ 

In Hungarian, a person (or a group) that does not agree with wearing masks 
and spreads and encourages opinions against it is defined as being maszkellenes 
(180, n, adj); maszkellenzé (3) or maszktagadó (92) ‘anti-mask’, e.g. Magyarországon 
is vannak komoly maszkellenes csoportok ‘There are also serious anti-mask groups 
in Hungary’; Nem véletlen a rengeteg virusszkeptikus és maszkellenes, am ezek nem 
a megfeleló reakciók egy ilyen válság idején. ‘It’s no coincidence that there are plenty 
of viral skeptics and anti-masks, but these are not the right reactions in a time of 
such a crisis.’ Among the neologisms, there is also maszkszkeptikus ‘mask skeptical’ 
(7) and maszkhasználat-ellenes ‘that does not agree with using masks’ (6). A partic- 
ularly complex neologism is maszkellenes-virustagadö-konteös ‘anti-mask, virus de- 
nier, conspiracy theorist’ (2). 
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With its 49 occurrences, the adjective maszkmentes ‘mask-free’ is also rather 
frequent in the corpus, e.g. A virustagadök egy része péntekre maszkmentes napot 
hirdetett. ‘Some of the virus deniers declared a mask-free day for Friday.’ The corre- 
sponding abstract noun maszkmentesség ‘the state of being mask-free’ (5) is rather 
rare. 

On the contrary, a person who agrees with wearing masks is maszkrajongó “fan 
of masks’ (6); maszkpärti ‘pro-mask’ (21) or maszkhivö (3), e.g. Heves összetűzések 
voltak országszerte a maszkpárti és maszkellenes tábor között, gyakran az üzletek 
elótt. “There were fierce clashes across the country between the pro-mask and anti- 
mask groups, often in front of shops.” 

In the corpus we can also find the abstract nouns maszktagadás ‘negation of 
masks” (15) and maszkellenesség “the condition of being anti-mask’ (16), and also 
maszkvita ‘mask debate’ (16) and maszkháború ‘mask war’ (8); the adjectives maszk- 
elutasitási ‘mask refusing’ (e.g. a maszk-elutasitäsi hajlandóság az életkorral csökken 
“the propensity for mask refusal decreases with age’); maszkelutasitö ‘mask refusing’; 
maszktagadós (adj) ‘mask-denier’; maszktalanitva ‘unmasked; a person whose mask 
was removed’. 

Maszk- ‘mask’ is present 130 times in wider expressions (with omissions), e.g. 
Emellett a hétvégén újra kótelezó a maszk- és kesztyúviselés. ‘In addition, wearing a 
mask and gloves is mandatory again over the weekend.’ 

The compound noun maszkgyártás ‘production of mask’ is present in the 2020 
Timestamped corpus 123 times, while the related maszkgyártó ‘producer of masks’ is 
recorded 120 times and maszkgyár ‘mask factory’ 7 times. It is also possible to find hits 
for maszkgydros ‘mask manufacturer’ (4); maszkgydras ‘manufacturer of masks’ (2), 
and maszkgyár-látogatás ‘mask factory visit’ (2). In the corpus, there are also a few 
hapax legomenon (with only 1 hit): maszkgyári (adj) ‘mask factory’ (e.g. Trump még a 
maszkgyári programjára sem vett fel maszkot ‘Trump did not even wear a mask during 
his mask factory visiting program’); maszkgyártási ‘mask manufacturing’ or maszk- 
gyártó-gép ‘mask making machine’. 

In addition, in the corpus, there are neologisms concerning: 

(i) the necessity to sew masks [maszkvarrás ‘mask sewing’ (95); maszkvarró ‘sewer 
of masks’ (32); maszkvarrógép ‘mask sewing machine’ (5); szäjmaszkkeszitö 
‘mouth mask maker’ (5);], and 

(ii) the initial worldwide mask shortage and the necessity to provide masks to the 
population [maszkhiány ‘lack of mask’ (70); maszkigény ‘mask demand’ (3); maszk- 
osztds ‘distribution of masks’ (43); maszkkészlet ‘mask stock’ (18); maszkbe- 
szerzés ‘purchase of masks’ (17); maszkvásárlás ‘mask purchase’ (15), 
maszkpiac ‘mask market’ (9); maszkadomány ‘mask donation’ (5); maszkoszto- 
gatás ‘mask distribution’ (5); maszkdiplomácia ‘mask diplomacy’ (59); maszk- 
kalózkodás ‘mask piracy’ (1); maszkiigy, maszk-iigy ‘mask affair’ (18); 
maszkszállitmány ‘mask shipment’ (35); maszkkiállitás ‘mask exposition’ (5); 
maszk-eladási (adj) ‘mask sale’ (e.g. Mult hét végén drasztikusan megugrottak 
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a hazai gyógyszertárakban az orvosi maszk-eladási számok “Last weekend, the 
numbers referring to the surgical mask sales in Hungarian pharmacies in- 
creased drastically’); maszk-szállitás ‘mask delivery’ (1); maszkbiznisz ‘mask 
business’ (6); maszkeladds ‘mask selling’ (3)]. 


In addition, the corpus contains 58 mask-related rare neologism and 53 hapax lego- 
mena. All these creations are included in TCD, even if this kind of data is usually 
left out of dictionaries. These hapax play an important role in the assessment of 
productivity and creativity of the Hungarian language. It is well known that the life- 
span of a neologism is, from the moment of its first appearance, uncertain and diffi- 
cult to predict: some of the neologisms seem destined to last, while others are not. 
In the long term, every forecast turns out to be uncertain and there is the risk of 
excluding neologisms destined for success. So, considering the neologisms containing 
korona- ‘corona-’, COVID-, járvány- ‘epidemic’, vírus- ‘virus’, maszk- ‘mask’, karantén- 
‘quarantine’, and oltds ‘vaccine’ as constituents, I have decided to systematically col- 
lect all the new words encountered, without taking into account their actual use and 
the degree of their diffusion. In fact, there is a risk of including too many entries in 
TCD, but there is also the advantage of identifying with greater precision the paths of 
neological activity. Later in time, there will be the possibility to understand the rea- 
sons for the success of some of these new words and to discuss the predictable failure 
of many occasional and ephemeral creations. In any case, I consider it useful to record 
all of these neologisms, hapax included, even if I am aware that their neological status 
is objectively less strong and sustainable. 

In total, in the corpus, there are more than 210 mask-related neologisms and 
only 9 are formed by ‘simple’ derivation, 1 is the lemma maszk, and the remaining 
parts are compounds and may be the result of multiple derivation. The neologisms 
maszkné ‘maskne’ and maszkitisz ‘maskitis’ are blend words (maszk + akné; maszk + 
dermatitisz) and they entered into the Hungarian language from English. Aranymaszk 
‘gold mask’ is an occasionalism. It refers to the story of the businessman Shankar 
Kurhade who bought a customized gold mask. Okosmaszk ‘smart mask’ is formed 
following the examples of okostelefon ‘smart phone’, okoseszkóz ‘smart device”, 
okosszemiiveg ‘smart glasses”, okosóra ‘smartwatch’, etc. 

All of the words are accompanied by useful child elements providing indications 
concerning frequency, such as ‘frequent’, ‘rare’, ‘hapax’, and frequency of use in 
time, such as a particular date (the point in time when a word started to be used), 
first, second, etc. wave or other information related to the trends of the words 
treated (unusual increase or decrease in use). For each entry, at least one transla- 
tional equivalent will be provided in Italian and English. In Figure 1, the entry 
maszkvita ‘mask debate’ illustrates the microstructure of TCD. 
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maszkvita; n 

Definition pro és kontra a szájmaszkokról 

Word formation process compounding 

Frequency rare 

Trend peaks in March and September/October 2020 

Temporal use from March 2020 

Register formal 

Connotative effect 

Subject field 

Style literal 

Etymology 

References 

— + Van olyan magyar telepiilés egyébként, ahol a polgármester döntötte el a 
maszkvitát: a koronavírus miatt szerdától csak maszkban mehetnek az emberek boltba 
és más üzletekbe a Fejer megyei Velencén. 

+ Emiatt húzódott sokáig a maszkvita is, de most már szinte mindenki azt javasolja, 
elövigyäzatossägböl hordjunk maszkot ott, ahol másokkal találkozunk. 

Note maszk + vita “debate, discussion” 

It. dibattito sulle mascherine; noun mwe 

Etymology 

References 

+ Il dibattito sulle mascherine & strettamente legato a un’altra questione che ha 
suscitato forti divisioni: in che modo il virus si sposta nell’aria e diffonde l’infezione? 
+ La variante Delta riaccende il dibattito sulle mascherine negli Usa 

Note 

Eng. mask debate; noun mwe 

Etymology 

References 

+ Delta variant reignites US mask debate 

+ Mask debate From School Boards to Courtrooms. 

Note 


Figure 1: Entry maszkvita ‘mask debate”. 


If, due to translational difficulties, no equivalent can be given, a descriptive/ex- 
planatory equivalent is added (cf. Figure 2), e.g. the word nyunyöka ‘comfort object’ 
has been in use since May 13, 2020, with peaks in May when it was used in the 
Chief Medical Officer Cecília Müller’s discourse and in December 2020 when it was 
named the word of the year. 
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nyunyóka; n 

Definition A nyunyóka is a lovey and it can be anything that is safe for a baby or toddler to have 
at sleep time. It is a sort of comfort or transitional item — blanket or stuffed animal or other 
comfort object of affection that a baby or toddler brings to bed, and that provides comfort and 
soothing. 

Orthographic variants - 

Word formation process derivation 

Frequency Before May 13, 2020, the term nyunyóka was uncommon and was used only in baby 
talk. 

Temporal use since May 13, 2020, as a result of the massive media impact of Chief Medical 
Officer Cecília Miiller’s discourse during the press conference of the Operational Force, 
concerning personal hygiene habits to teach kids and the necessity to wash comfort objects 
frequently, this neologism entered the common language and became widely known and used. 
Trend Increasing. The results of the search query using Google now lists 46,500 pages (as of 
December 17, 2021). 

Register baby talk > common language 

Connotative effect term of endearment 

Subject field 

Style 

Etymology Mind a nyanya, mind a nyunyó fóltehetóleg dajka-, gyermeknyelvi, hangulatfestó és 
talán hangutánzó szó. Elképzelhető a nyúl, nyuszi szavak becézéséból fakad. Ezt erősíti, hogy a 
neten nyuszi-nyunyi szundikendő is rendelhető. https://e-nyelvmagazin.hu/2021/01/10/nyu 
nyoka-nyunyo/ 

References Veszelszki 2020: 54-55. 

— y Kerülni kell, hogy a gyerek otthonról játékot hozzon el: pelenka, „nyunyóka”, játék maradjon 
otthon, ám azokat otthon is tisztítsuk rendszeresen. 

Note Nyunyóka: Kisgyermek alvó játékszere, leginkább plüssfigura. Másnéven: alvóka, rongyi. 
Hangtani rokona a nyanya, a nagymama kedveskedő, az öregasszony gúnyos megnevezése. 

It. doudou; n 

Etymology 

References 

Note oggetto transizionale 

Eng. lovey; n 

Etymology 

References 

Note comfort object, transitional object 


Figure 2: Entry nyunyóka ‘comfort object”. 


5 Conclusions 


The high number of coroneologisms draw attention to the creativity and vitality of 
the Hungarian language in times of crisis, and the corpus analyses performed for the 
Trilingual (HU, IT, EN) COVID-19 Dictionary provide a clearer picture of the change in 
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the vocabulary during the COVID-19 pandemic and of the role and function of word 
formation processes that contributed to the creation of these neologisms. The analyses 
would suggest that the most frequently occurring word formation processes of the 
Hungarian neologisms related to the pandemic are compounding and derivation, 
but syntagms, blending, and semantic extension (changes in lexical meaning) are 
also used. [Furthermore, in Hungarian, new words may be productively created also 
by means of conversion, backformation, reduplication, clipping, loan words, and 
loan formations (e.g. calques), metaphor . . .] At the end of the pandemic, the analy- 
ses will also reveal to what extent the Hungarian language borrows coroneologisms 
from other languages. 

Time passes, but the impact of the COVID-19 pandemic on the Hungarian lan- 
guage is still strong in 2021. It is true that many terms had their peaks during the first 
months of the crisis in 2020, but each phase and wave produce new topics, and terms, 
and generate considerable frequency increases in the use of certain forms. Therefore, 
the corpus-based dictionary will be a valuable tool to explore and analyze the corona- 
lexicon in the Hungarian press and common language during this global emergency 
thanks to the microstructure of the entries that, where possible, includes information 
on frequency, trends, and temporal use. The entries contain a morphological analysis 
too, so that TCD provides data that will help to analyze the trends and patterns in the 
formation of new words and in their frequency of use in Hungarian. 

Overall, this dictionary is useful for linguists and translators (e.g. suggesting 
more accurate translation equivalents for translating the coroneologisms from Hun- 
garian to English or Italian) or for scholars in the digital humanities. 


Bibliography 


Balázs, Géza (2020): A koronavirusröl szóló beszéd (nyelv és folklór). In Koväcs, László (ed.): 
Globális kihivás - lokális válaszok. A koronavirus (Covid19) gazdasági és társadalmi 
összefüggései és hatásai. Szombathely: Savaria University Press, 229-240. 

Busta, Jan, et al. (2017): JSI Newsfeed Corpus. In: The 9% International Corpus Linguistics 
Conference, July 25-28, 2017, Birmingham, GB: extended abstracts. Birmingham: University of 
Birmingham, [https://www.birmingham.ac.uk/Documents/college-artslaw/corpus/confer 
ence-archives/2017/general/paper382.pdf; last access July 29, 2021]. 

Coroneologisms are going viral. In: Economic Times. 9 April 2020. [https://economictimes.india 
times.com/blogs/et-editorials/coroneologisms-are-going-viral/ last access June 30, 2020]. 

Dajka, Balázs (2020): Lecsapott a koronavírus az amúgy is nehéz idóket éló kézicsapatra. In: 24. 
hu. October 20, 2020. [https://24.hu/sport/2020/10/20/kezilabda-koronavirus-karanten- 
siofok-mtk/; last access July 29, 20211. 

Domonkosi, Ágnes/Ludányi Zsófia (2020): Társas távolságtartás és nyelvi közeledés - e-mailezési 
gyakorlatok a koronavírus idején. In: Kovács, László (ed.): Globális kihívás — lokális válaszok. 
A koronavírus (Covid19) gazdasági és társadalmi összefüggései és hatásai. Szombathely: 
Savaria University Press, 241-260. 


162 — Judit Papp 


Elszabadult a koronavirus-järväny a világon: egy nap alatt 400 ezer fertözött. In: Portfolio.hu. 

19 October 2020. [https: / /www.portfolio.hu/gazdasag/20201019/elszabadult-a-koronavirus- 
jarvany-a-vilagon-egy-nap-alatt-400-ezer-fertozott-453448%; last access July 29, 2021]. 

Hát ezért is olyan veszélyes, hogy berobbant Magyarországon a koronavírus. In: Portfolio.hu. 

20 October 2020 [https://www.portfolio.hu/gazdasag/20201020/hat-ezert-is-olyan-veszelyes- 
hogy-berobbant-magyarorszagon-a-koronavirus-453454; last access July 29, 20211. 

Istók, Béla/Lórincz, Gabor (2020): A virolingvisztika részterületei. In Simon, Szabolcs (ed.): 12” 
International Conference of J. Selye University. Language and Literacy Section. Conference 
Proceedings. |. Selye University. Komárno. 83-92. [DOI: 10.36007/3761.2020.83; last 
access July 29, 2021]. 

Kegyes, Erika/Lanzmaier-Ugri, Katharina (2020): A tóbbnyelvü tájékoztatás kihívásai, stratégiái és 
eredményei a koronavírus idején Ausztriában. In Kovács, László (ed.): Globális kihívás — 
lokális válaszok. A koronavírus (Covid19) gazdasági és társadalmi összefüggései és hatásai. 
Szombathely: Savaria University Press, 261-289. 

Kilgarriff, Adam (2001): Web as corpus. In Rayson, Paul et al. (eds.): Proceedings of the Corpus 
Linguistics 2001 Conference, Lancaster University, March 29-April 2, 2001. Lancaster: UCREL, 
342-344. 

Kilgarriff, Adam/Greffenstette, Gregory (2003): Introduction to the Special Issue on the Web as 
Corpus. In: Computational Linguistics 3(29), 333-347. [DOI: 10.1162/089120103322711569; 
last access July 29, 2021]. 

Kovács, László (ed.) (2020): Globális kihívás — lokális válaszok. A koronavirus (Covid19) gazdasági 
és társadalmi összefüggései és hatásai. Szombathely: Savaria University Press. 

Lénárt, István (2020): Karanténykedünk: a karanténdepitól a karanténszexig — avagy hogyan 
árulkodnak tevékenységeinkről a Koronavírus-járvány alatt született szóösszetételek. In 
Kovács, László (ed.): Globális kihívás - lokális válaszok. A koronavírus (Covid19) gazdasági és 
társadalmi összefüggései és hatásai. Szombathely: Savaria University Press, 291-302. 

Ludányi, Zsófia (2020): Helyesírási kérdések pandémia idején. In: Amega: asztma és allergia színes 
tájékoztató magazin 4(27), 33-35. 

Ludányi, Zsófia (2021): Eloltották az oltóanyagot. In: Amega: asztma és allergia színes tájékoztató 
magazine 2(28), 37-39. 

Ludányi, Zsófia (2021): Oltakozás és pándémia - avagy a virolingvisztika legújabb kérdései. In: 
AMEGA: asztma és allergia színes tájékoztató magazin 1(28), 43-45. 

Méchura, Michal Boleslav (2017): Introducing Lexonomy: an open-source dictionary writing and 
publishing system. In: Kosem, Iztok et al. (eds): Electronic lexicography in the 21° century: 
Lexicography from Scratch. Proceedings of the eLex 2017 conference, September 19-21, 2017. 
Leiden: Lexical Computing, 662-679. 

Papp, Judit (2021): il nostro lessico è diventato "virale". Il vocabolario dell'emergenza sanitaria, 
economica e sociale ai tempi della pandemia di COVID-19. In: Studia Universitatis Babes- 
Bolyai - Philologia 1(2021), 325-344. 

Roig-Marín, Amanda (2020): English-based coroneologisms: A Short Survey of Our Covid-19- 
Related Vocabulary. In: English Today, 1-3. DOI: 10.1017/s0266078420000255. 

Trampus, Mitja/Novak, Blaz (2012): The internals of an aggregated web news feed. In: Proceedings 
of the Fifteenth International Information Science Conference IS SiKDD. October 8-12, 2012, 
[Ljubljana, Slovenia], 431-434. 

Tyrrell, David Arthur John/Michael Fielder (2002): Cold Wars: The Fight Against the Common Cold. 
Oxford: Oxford University Press. 

Veszelszki, Ägnes (2020): Karantenszötär. Budapest: Interkulturälis Kft. 

Virology. Coronaviruses. In: Nature 220 (November 16, 1968): 650. [https://doi.org/10.1038/ 
220650b0; last access 29 July 2021]. 
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Coronavirus-related neologisms: A challenge 
for Croatian standardology 

and lexicography 


1 Introduction 


Languages have always evolved to reflect societal changes, but in 2020-2021 their 
evolution could be seen in real-time. The appearance of Coronavirus' has led to an 
abundance of new words and phrases, both in Croatian and other languages.? As 
stated by Lawson,“[t]his new vocabulary helps us make sense of the changes that 
have suddenly become part of our everyday lives.” (Lawson 2020). 

Recent epidemics have given rise to the appearance of new words and phrases 
which were not necessarily coined for the current COVID-19° pandemic. Still, those 
words and phrases have gained a far wider usage since 2020, e.g., “the term info- 
demic was coined in 2003 for the SARS epidemic, but has also been used to describe 
the current proliferation of news around coronavirus.” (OED Blog 2020). 

Moreover, many terms used mostly by medical experts have entered the general 
language (e.g. Coronavirus, epidemiology, asymptomatic). From spring 2020, layper- 
sons have become familiar with terms that have been around for years but have not 
been used in the general language, some even dating from the 19% century but hav- 
ing achieved new and much wider usage. For example, “self-isolation (recorded 
from 1834) and self-isolating (recorded from 1841) are now used to describe self- 
imposed isolation to prevent catching or transmitting an infectious disease, where 
in the 1800s these terms were more often applied to countries which chose to de- 
tach themselves politically and economically from the rest of the world.” (OED Blog 
2020). 


1 In English, this term is spelled Coronavirus and coronavirus. In this paper, we use Coronavirus. 
2 E.g., for Macedonian cf. Janusheva (2020), for Russian cf. Karachina (2020). 
3 In English, this term is spelled Covid-19 and COVID-19. In this paper, we use COVID-19. 


Note: This paper is written within the research project Croatian Web Dictionary — MreZnik. 
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The media and social media have an important role in the appearance and 
spreading of new Coronavirus-related words, phrases, and meanings and old terms 
used in a new context. The appearance of Coronavirus was followed by Coronavirus 
jokes,* memes, and puns e.g., Ljubav u doba kolere > Ljubav u doba korone > Zivot u 
doba korone “love in the time of cholera” > love in the time of corona” > life in the 
time of corona”.? Even some Coronavirus-related nicknames appeared in Croatian 
media and social media. For example, Apaurin® Hrvatske ‘Apaurin of Croatia’ was a 
nickname for the Croatian epidemiologist Alemka Markotié meaning that she can 
calm down Croatia.” Toni Cjepinski “Toni the vaxxer’ is a nickname for Croatian 
singer Toni Cetinski who is well-known for his opposition to vaccination.® 

The appearance of new words and phrases, new meanings of existing words, 
and the shift of words and phrases from (mostly medical) scientific terminology to 
the general language has been studied by linguists from different perspectives — 
cognitive linguistics, lexicology, ethnolinguistics, phraseology, corpus linguistics, 
etc. “Even a superficial glance at public discourse on the pandemic reveals that it is 
saturated with metaphors (we talk about epidemic epicentre, epidemic focal point, 
the wave of the epidemic, modern plague and flaming epidemic), and especially 
with war metaphors (words like headquarters, first line of defence, invisible enemy 
and the war against the virus)." (Strkalj Despot/Ostro3ki Anić 2021: 3). 


2 Methodology 


This paper focuses on standardological and lexicographical aspects of Coronavirus- 
related neologisms in Croatian. The presented results are based on corpus analysis. 
The initial corpus for this analysis consists of terms collected for the Glossary of Co- 
ronavirus. This corpus has been supplemented by terms we collected on the Internet 
and from the media. The General Croatian corpora: Croatian Web Corpus — hrWaC 
(cf. Ljubesié/Klubitka 2016) and Croatian Language Repository (cf. Brozović Ron- 
tevic/Cavar 2008: 173-186) were also used, but since they do not include neolo- 
gisms that entered the language after 2013, they could be used only to check terms 
in the language before that time. From October 2021, a specialized Corona corpus 
compiled by Strkalj Despot and Ostroški Anić (2021) became publicly available on 


4 Coronavirus jokes in Croatian were analyzed by Milos (2020). 

5 Strkalj Despot (2020) lists such puns in Croatian and mentions that the phrase u doba korone “in 
the time of Corona” has more than 4 million hits on Google. 

6 Apaurin is the name of an anxiolytic. 

7 However, as the pandemic continued, the nickname disappeared as quickly as it appeared. 

8 In Croatian cijepiti se means ‘to vaccinate”. 
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request.? The data from these corpora are analyzed by Sketch Engine (cf. Kilgarriff 
et al. 2004: 105-116), a corpus query system loaded with the corpora, enabling the 
display of lexeme context through concordances and (differential) word sketches 
and the extraction of keywords (terms) and N-grams. The most common colloca- 
tions are sorted into syntactic categories. For English equivalents, in addition to the 
sources found on the Internet, enTenTen2020 corpus was consulted. 

In the second part of the paper, we analyze and compare the presentation of 
Coronavirus terminology in the descriptive Glossary of Coronavirus and the norma- 
tive Croatian Web Dictionary - MreZnik. 


3 Loanwords, loan translations or Croatian 
neologisms 


The COVID-19 pandemic caused the appearance of many Coronavirus-related neolo- 
gisms in many languages. Due to the speed and intensity of this infection, many Cro- 
atian words have been directly borrowed from English (cf. Strkalj Despot 2020: 2). 

Many Coronavirus-related neologisms in Croatian are either loanwords or loan 
translations from English. In some cases, it is difficult to determine whether a partic- 
ular term was independently formed in Croatian or is a loan translation from English. 
Table 1 shows some Coronavirus-related loanwords which have entered Croatian 
with no or little adaptation to the Croatian language system.'? 

English Coronavirus-related neologisms are often compounds/multi-word units," 
blends, and abbreviations. English has many compounds with the first element 


9 "First, we compiled a specialized corpus of Croatian media texts (referred to here as the Korona 
corpus) using Sketch Engine corpus compilation tools. The corpus consists of manually selected 
texts dated January 29 to December 23, 2020, all closely related to the coronavirus Covid-19 pan- 
demic topics. For free access to the corpus, please write to the authors." (cf. Strkalj Despot/Ostroski 
Anié 2021: 180). Mihaela MateSié announced the compilation of another Croatian Corona-related 
corpus in her presentations Analiza stavova u drustvenim medijima u istrazivanju krizne komunika- 
cije u doba pandemije (Analysis of attitudes in social media in the research of crisis communication 
during the pandemic) in Rijeka, June 23, 2021 (5. simpozij SCIMETH) https://cji.uniri.hr/scimeth- 
2021/) and IstraZivanje krizne komunikacije u digitalnom okruZenju u doba pandemije (Exploring cri- 
sis communication in the digital environment during the pandemic) (with Slobodan Beliga), Osijek, 
September, 9-11 2021 http://www.hdpl.hdpl.hr). On the same symposium Mateja Sporcié, Stjepan 
Lackovié, and Marina Baralié in the presentation Big data resursi u istraZivanju metafora (Big data 
resources in the study of metaphors) announced the compilation of yet another Corona corpus. 
However, these corpora are in November 2021 not yet publicly available. 

10 According to Croatian orthographic rules, unadapted loanwords are spelled in italic. 

11 It is often difficult to differentiate between compounds and multi-word units in English as the 
spelling of these terms varies — one or two words. 
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Corona. The same model is very productive in Croatian Coronavirus-related neolo- 
gisms. In Table 2, some compounds with the first element Corona are given in En- 
glish and Croatian. 


Table 1: English loanwords in Croatian. 


English Croatian 
drive-in drive-in 
fake news fake news 
infodemia infodemija 
targeted targetiran 
lockdown lock down 


Corona free korona free 


Table 2: Compounds with the element Corona in English and Croatian. 


English’? Definition Croatian 
Corona babies/ the generation born after December 2020 and who will koronijalci 
Corona boomers become teenagers in 2033-2034; the terms coronals and 


quaranteens?? have the same meaning 


Corona bed hospital bed with a ventilator for Covid patients koronakrevet 


Corona crew people one chooses to live with during the quarantine; = 
the term quaranteam has the same meaning 


Corona crisis crisis caused by the COVID-19 pandemic koronakriza 

Corona cushion fat acquired during lockdown due to COVID-19 korona3pek 

Corona haircut haircut performed by oneself or another unqualified koronafriz 
person 

Corona humor jokes and memes which appeared on (social) media koronahumor 


during the COVID-19 pandemic 


Corona partner/ a partner during the lockdown period that implies brevity - 
boyfriend/girlfriend of the relationship 


Corona rules rules one has to obey during the COVID-19 pandemic koronapravila 


12 We have consulted Alyeksyeyeva/Chaiuk/Galitska (2020) and Khalfan/Batool/Shehzad (2020) 
for definitions of these terms. 
13 The term quaranteens sometimes has another meaning, “teenagers during the quarantine”. 
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Table 2 (continued) 


English Definition 


Corona speak 
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Croatian 


neologisms that have entered everyday discourse during - 


the Coronavirus pandemic 


Corona traffic light 


marking of regions by different colors (green, yellow, and koronasemafor 


red) according to the number of registered newly infected 


by Coronavirus 


From Table 2, we can see that some of these terms have a Croatian equivalent 
formed by the same model Corona + noun e.g., Corona bed is koronakrevet, Corona 
haircut is koronafriz, Corona traffic light is koronasemafor, etc. Some terms that exist 
in English have not been recorded in Croatian e.g., Corona boyfriend, Corona crew, 


Corona partner. 


Coronavirus-related English neologisms are often formed by blending. As this 
word-formation type is not typical in Croatian, the equivalent Croatian terms were 
usually formed differently, as shown in Table 3. 


Table 3: Coronavirus-related blends in English with their Croatian equivalents. 


English term Croatian term 


coronacession recesija uvjetovana koronom 


(‘recession caused by Corona’) 


Word formation elements and definition 


Corona + recession 
recession caused by Coronavirus 


coronials/ 
quaranteens 


koronijalci 


Corona + Millennials 

children conceived or born during the COVID-19 
pandemic, reminiscent of the label millennial 
describing those raised around the turn of the 
millennium (cf. Words We're Watching: 
*Coronial") 


covidiot kovidiot 


COVID + idiot 

an insulting term for someone who ignores 
health advice about COVID-19, hoards food 
unnecessarily, etc. (cf. Macmillan Dictionary) 


covidivorce koronarazvod (coronadivorce) 


COVID + divorce 

divorce caused by the fact that partners spend 
too much time together due to COVID-19 
lockdown 
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Table 3 (continued) 


English term Croatian term Word formation elements and definition 
quarantime vrijeme (provedeno) u quarantine + time 
karanteni (time (spent) in a the time during COVID-19 when all of the days 
quarantine”) blend because you are stuck inside due to a 


worldwide pandemic (cf. Urban Dictionary) 


quarantini quarantini quarantine + Martini 
a drink that one drinks during lockdown 


she-cession - she + recession 
the recession caused by COVID-19, which 
affected women more than men!“ 


twindemic twindemija twin + pandemic 
flu season paired with COVID-19 


If we compare English blends with their Croatian equivalents, we can conclude: 
1. Some English blends are used as loanwords in Croatian; some are adapted to 
Croatian orthography (covidiot > kovidiot, coronials > koronijalci), some are only 


partly adapted (twindemija), and some are spelled as in English (quarantini); 


2. Croatian has compounds (covidivorce > koronarazvod) as equivalents of English 


blends; 


3. Croatian uses multi-word expressions e.g., coronacession > recesija uvjetovana 
koronom ('recession caused by Corona”), quarantime > vrijeme (provedeno) u 


karanteni) (“time spent in quarantine”) as equivalents of English blends; 


4. instead of the possible loanword karantinci for quaranteens Croatian uses the 


synonymous term koronijalci (‘coronials’); 
5. Croatian does not have any equivalent of she-cession. 


However, although blending is not common for Croatian these Coronavirus-related 
blends have been recorded on the Internet: kupomanija — kupovanje (‘shopping’) + 
manija (‘mania’) (‘mania for shopping, extensive shopping”), Zoomor — Zoom + umor 


(fatigue”) (fatigue caused by Zoom”). 


In English Coronavirus-related terminology, many abbreviations are used. Abbre- 


viations and multi-word terms from which they are derived are shown in Table 4. 


Some of the English abbreviations are also used in Croatian e.g., COVID, SARS- 
CoV-2, PCR.” There are no Coronavirus-related abbreviations from Croatian terms. 
In Croatian, either the English abbreviation or the full Croatian term is used e.g., 


14 “Of the 20.5 million jobs lost last month, women make up to 55 percent of those now looking for 
work. Their unemployment rate is about 2 percent higher than that of men.” (cf. Andrews 2020). 
15 PCR is in Croatian read as in English ['pi:-'si:-'ar]. Croatian reading would be ['pe:-'ce:-'er]. 
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Table 4: Abbreviations in English and Croatian. 


Abbreviation Multi-word term 
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Croatian 


ARDS acute respiratory distress syndrome akutni respiratorni distresni sindrom 
CoV Coronavirus koronavirus 
COVID-19 Coronavirus disease 19 (the year of koronavirusna bolest 2019 
onset, 2019) 
ECMO extracorporeal membrane oxygenation izvantjelesna membranska oksigenacija 
MERS Middle East respiratory syndrome bliskoistočni respiratorni sindrom 
NAAT nucleic acid amplification test test koji se temelji na umnažanju 
nukleinske kiseline 
PCR polymerase chain reaction polimerazna lančana reakcija 
PPE personal protective equipment osobna zaštitna oprema 
PUI patient under investigation osoba pod zdravstvenim nadzorom 
SARS-CoV severe acute respiratory syndrome — teški akutni respiratorni sindrom — 
Coronavirus koronavirus 
WFH working from home rad od kuće 


osobna zaštitna oprema (‘personal protective equipment”), rad od kuće (‘work from 
home’) are never abbreviated. 


4 Coronavirus-related neologisms in Croatian - 


descriptive analysis 


Croatian Coronavirus-related neologisms can be analyzed according to different 
criteria: 


1. single words vs. multi-word units 
Coronavirus-related neologisms can be single words or multi-word units as shown 
in Table 5. 

Single words can be analyzed according to word-formation types and can be 
divided into compounds, derivatives, and semi-compounds; examples are given 
in Table 6. 
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Table 5: Coronavirus-related single words and multi-word units. 


Single words Multi-word units 
Croatian English Croatian English 
koronazabava coronaparty skupni/kolektivni imunitet group/collective immunity 


samotestiranje self-testing drugi/treci/cetvrti val 


Table 6: Compounds, derivatives, and semi-compounds. 


second/third/fourth wave 


Compounds Derivatives Semi-compounds 
Croatian English Croatian English Croatian English 
koronaredar Corona korona3 person who has delta-soj delta 
security guard Coronavirus strain 
koronamafija Corona mafia antimasker person who is delta-varijanta/ delta 
against wearing delta-inacica variant 
masks 
kovidpozitivan positive for samoizolacija  self-isolation alfa-soj alpha 
Coronavirus strain 
prvozaraZeni first personto supersiritelj/ superspreader alfa-varijanta alpha 
be infected superSiriteljica (male, female) variant 


2. scientific/technical terms (academic jargon terms) vs. jargon words 
Some Coronavirus-related neologisms are scientific terms while others are jargon 
words; examples of scientific terms entering the general language are shown in 


Table 7. 


Table 7: Scientific terms entering the general language. 


Croatian English 

anosmija (or anozmija) anosmia 

antigenski test antigen test 
antivakser/antivakserica antivaxxer (male, female) 
crvena zona red zone 

dezinfekcija disinfection 
epidemiolog/epidemiologinja epidemiologist (male, female) 


fizička distance/fizicka udaljenost social distancing 
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Table 7 (continued) 


Croatian English 

izolacija/samoizolacija isolation/self-isolation 

izravnavanje krivulje flattening the curve 

komorbiditet comorbidity 

probir screening 

respirator respirator 

simptomatski/asimptomatski bolesnik/ symptomatic/asymptomatic patient (male, female) 
bolesnica 

supersiritelj/supersiriteljica super-spreader (male, female) 


Many academic terms (mainly medical terms) have entered the general dis- 
course. Words like symptomatic/asymptomatic, disinfection, isolation, self-isolation, 
and super-spreader have become a part of general discourse, along with terms from 
other disciplines such as red zone, social distancing, and flattening the curve. “The 
expression social distancing, for example, has gone from being a relatively un- 
known piece of academic jargon to something we hear multiple times a day (al- 
though the World Health Organization prefers physical distancing). Usage of the 
phrase flattening the curve has increased exponentially. The word super-spreader 
has also spread from mouth to mouth at a dizzying rate.” (Mahdawi 2020). 

Examples of neologisms in Croatian jargon are shown in Table 8. 


Table 8: Jargon neologisms. 


Croatian English 

cjepisa a person who has been vaccinated, a person who is in favor of vaccination 
imunitet krda herd immunity 

kaSljosram shame because of coughing 

koronafriz Corona haircut 

kliconoša a person or animal that carries microorganisms that cause disease but has no 


signs of a disease 


koronakašljač a person who coughs on purpose so people would think he has COVID-19 


koronaš/ a person (male/female) infected by Coronavirus 
koronašica 


koronašpek Corona fat 
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Table 8 (continued) 


Croatian English 

koronizirati infect by Corona virus 

kovidas a person having COVID-19 

necjepisa anti-vaxxer, a person who is against vaccination 


If we compare scientific terms entering the general language with jargon terms, 
we can see that scientific terms are mostly of Latin origin and reflect the English 
term, while jargon terms are mostly derivatives formed by suffixation from Croatian 
elements. However, there are many jargon terms with the first element Corona, 
which closely mirror the English model. 


3. new words vs. words getting a new specialized meaning often through meta- 
phorisation or specialization of meaning 

The Coronavirus pandemic caused the appearance of many new terms and addi- 
tional meanings of the existing words; examples are given in Table 9. 


Table 9: New words and new meanings. 


New words New meanings 
Croatian English Croatian English 
koronakriza Corona crisis balon, balončić, mjehurić bubble 


kovidaš a person having COVID-19 zatvaranje lockdown 


4. terms denoting disease and diagnosis, terms denoting human reactions, 
and new way of life 

Terms can be roughly divided according to their meaning into these groups: 1. the 
disease, diagnosis, and disease-related terms; 2. human reactions to the disease and 
behavioral patterns; 3. new work and school life; examples are given in Table 10. 


5 Questions from native speakers 


From the beginning of the pandemic in February 2020, the Institute of Croatian Lan- 
guage and Linguistics received many questions from the media but also the general 
public asking for language advice. Soon it became apparent that Coronavirus-related 
terminology presented many problems for Croatian speakers. For example, many 
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orthographic variants of the same term were simultaneously used in the media, 
sometimes even in the same text e.g., COVID bolnica, covid bolnica, Covid bolnica, 
COVID-bolnica, covid-bolnica (‘COVID hospital”). Some of these problems occur also 
in English but when orthographic variants are combined with lexical synonyms 
many terms denoting the same concept were recorded e.g., the terms COVID potvrda, 
COVID-potvrda, covid potvrda, covid-potvrda, COVID certifikat, COVID-certifikat, covid 
certifikat, covid-certifikat, COVID putovnica, COVID-putovnica, covid putovnica, covid- 
putovnica are all equivalents of the English term COVID certificate. 


Table 10: Terms divided according to meaning. 


Disease, diagnosis, and Human reactions, New life 

disease-related terms prevention 

Croatian English Croatian English Croatian English 

koronavirus Coronavirus laktovanje/ elbow bump — fiziéka/socijalna social distance 
laktarenje udaljenost/distanca 

nulti patient zero samoizolacija self-isolation hibridna nastava hybrid learning / 

pacijent blended learning 

polumaska half-mask potvrda o certificate of rad od kuće work from home 
cijepljenju vaccination 


Most of the questions could roughly be divided into two groups: 1. the correct spell- 
ing of certain terms, 2. the correct Croatian term for an English term. Table 11 shows 
some characteristic questions from native speakers divided according to ortho- 
graphic, grammatical, and lexical levels. 


Table 11: Questions from native Croatian speakers. 


Level Questions Answers 


Orthographic What is the correct spelling of words docijepiti/docjepljivati (‘revaccinate’), 
level derived from the word cijepiti procijepljenost (‘vaccination rate’), 
(‘vaccinate’)? cijepljenik (‘vaccinee’), cjepivo 
(‘vaccine’), cjepitelj (‘vaccination 
administrator’), cjepiliste 
(‘vaccination point’) (cf. Lewis 2021: 


28-33) 
What is the correct spelling of the Pfizer-BioNTech, AstraZeneca-Oxford, 
names of some vaccines? Johnson & Johnson, Sputnjik (cf. Lewis 


2021: 28-33) 
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Table 11 (continued) 


Level Questions Answers 
What is the correct spelling of terms COVID incidencija (‘COVID incidence’), 
with the element COVID? COVID kredit ((COVID loan’), COVID 
krevet (‘COVID bed’), COVID negativan 
(‘COVID negative’), COVID pacijent 
(‘COVID patient’), COVID planinar 
(‘COVID climber’), COVID pozitivan 
(‘COVID positive”), COVID redar (‘COVID 
monitor”), COVID situacija 
(‘COVID situation’), post-COVID 
What is the correct spelling of terms The element korona- should be spelled 
with the element korona- (‘Corona-’)? together with the next element 
koronabolnica (‘Corona hospital”), 
koronafobija (‘Corona phobia’), 
koronahumor (‘Corona humor’), 
koronakriza (‘Corona crisis’), etc. 
What is the correct spelling of terms The correct spelling is alfa-soj, delta- 
alfa(-)soj (‘alpha variant’), delta(-) varijanta, delta-inacica, and delta plus soj. 
varijanta, delta(-)inacica (‘delta 
variant’), delta(-)plus(-)soj (‘delta plus 
variant’)? 
Morphological How should we decline COVID-19? genitive: COVID-a 19; dative: COVID-u 19 
En How should we decline the names of genitive: Pfizer-BioNTecha, AstraZeneca- 
vaccines? Oxforda, Johnsona & Johnsona; dative: 
Pfizer-BioNTechu, AstraZeneca-Oxfordu, 
Johnsonu & Johnsonu (cf. Lewis 2021: 
28-33) 
Word- What is the correct adjective derived The adjective epidemioloski (‘relating to 
formation from the noun epidemija (‘epidemic’) epidemiology’) is derived from 


and epidemiologija (‘epidemiology’)? 


epidemiologija (‘epidemiology’), and 
epidemijski (‘relating to the epidemic’) is 
derived from epidemija (‘epidemic’). The 
measures are protuepidemijske (‘anti- 
epidemic’). 


Is it better to say antigen-testovi or 
antigenski testovi (‘antigene tests’)? 


The Croatian term should be antigenski 
testovi. 


Is it correct to say eksponencijalni or 
eksponencijski rast (‘exponential 
growth’)? 


eksponencijski rast 
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Table 11 (continued) 


Level Questions Answers 
Syntactic level Is it correct to say NAAT test? As Tin NAAT means ‘test’, NAAT test is a 
pleonasm and only NAAT should be 
used. 
What is the correct gender of the As acronyms except those ending in a 
adjective in the term glasnicki/ are of masculine gender the correct term 


glasnicka mRNK (‘messenger mRNA’)? is glasnicki mRNK. 


Lexical level Is there a Croatian term for socijalna Croatian terms are fizicka udaljenost, 
distance ('social distance”), lokalna lokalni prijenos, kolektivni imunitet 
transmisija (local transmission”, provozno testiranje. 


imunitet krda (‘herd immunity’), drive- 
in testiranje (‘drive-in testing’)? 


Are the words karantena (‘quarantine’) Izolacija (‘isolation’) is the separation of 

and izolacija (‘isolation’) synonyms?! a person or a group of people from the 
rest of the population. Samoizolacija 
('self-isolation”) is a form of self- 
imposed isolation. 
Karantena (‘quarantine’) is a legally 
imposed restriction of movement of 
people exposed to a contagious disease 
to see if they become sick. There is no 
self-quarantine. Quarantine is a stricter 
and controlled isolation. Every 
quarantine is isolation, but every 
isolation is not quarantine. 


Some of the answers to these questions were published online in the two data- 
bases Jezicni savjetnik (language advice”) and Bolje je hrvatski (“better in Croa- 
tian””), and a special issue of the journal Hrvatski jezik (Croatian language) on 
Coronavirus and e-learning during the lockdown. In the papers, the linguistic as- 
pects of COVID-19 pandemic were analyzed from different points of view: cognitive 
linguistics, i.e., Coronavirus-related metaphors (cf. Strkalj Despot 2020: 1-7), ono- 
mastics’® — names of people fighting COVID-19 (cf. Vidović 2020: 18-19), phraseol- 
ogy (cf. Kovačević 2020: 25-29), standardology (cf. Blagus Bartolec 2020: 30-32), 


16 The speakers asked similar questions about the terms antigen (‘antigen’) and antitijelo (‘anti- 
body”), pandemija (‘pandemic’) and epidemija (epidemic), virulencija (‘virulence’) and patogenost 
(‘pathogenicity’), cijepiti and procijepiti (both derivatives of vaccinate meaning ‘to vaccinate and to 
obtain a vaccination rate"), etc. 

17 A website that suggests Croatian equivalents for Anglicisms. 

18 Onomastic analysis of the names of people leading the fight against COVID-19 in Croatia. 
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etymology (cf. Ivsié Majié 2020: 43-44), e-learning (cf. Hudecek/Mihaljevié 2020b: 
13-17 and Jozié et al. 2020: 20-24). 

After the publication of the Glossary of Coronavirus in April 2020, questions and 
suggestions from the Glossary users followed. They suggested the inclusion of some 
terms, proposed new words (drive-in > dovozni), and asked for etymological, normative, 
and pragmatic explanations (why is it correct to spell koronavirus and not korona virus) 
and suggested corrections or changes of some terms and/or definitions. They offered 
praise (well done for drive in > provozni) and criticism e.g., zaštitna maska (‘protective 
mask”) is not the same as kirurska maska (“surgical mask”), imunoloski sustav should be 
imunosni sustav (immune system’). 

Some of the comments are shown in Table 12. 


Table 12: Questions on the Glossary of Coronavirus. 


Question Answer 


Is it correct to say epidemioloska or The word epidemijski is derived from epidemija 

epidemijska kriza ‘epidemiological crisis’? (‘epidemic’) and the word epidemiološki is derived 
from epidemiologija (“epidemiology”). So the adjective 
epidemijski is related to the epidemic, and the 
adjective epidemioloski is related to epidemiology. An 
epidemic causes the crisis, so the term should be 
epidemijska kriza (‘epidemic crisis’). 


Shouldn’t epidemija (‘epidemic’) be The word poSast is a synonym of epidemija, so 

replaced by the Croatian word posast? epidemija can be replaced by pošast in the general 
language. However, epidemija is a medical term and 
pošast belongs only to the general language. 


Could the Croatian neologism raskuZivalo The Croatian term raskuZivac has the same meaning 
replace dezinficijens (‘disinfectant’)? as raskuZivalo so there is no need to form a new term 
from the same stem with a different suffix. 


Is it better to say simptomski bris or As the adjective should relate to the noun simptom 
simptomatski bris (‘symptomatic swab’)? (‘symptom’) according to Croatian word-formation 
rules, it should be simptomski bris. 


6 Coronavirus-related neologisms in Croatian 
lexicography 
Many countries have online glossaries and dictionaries of Coronavirus-related ter- 


minology e.g., German Neuer Wortschatz rund um die Coronapandemie; Dutch 
Coronawoordenboek. 
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Coronavirus-related vocabulary presented challenges to lexicographers as they 
had to make many choices in a short time and without (or before) an adequate cor- 
pus. “It is a rare experience for lexicographers to observe an exponential rise in 
usage of a single word in a very short period, and for that word to come overwhelm- 
ingly to dominate global discourse, even to the exclusion of most other topics. 
Covid-19, a shortening of coronavirus disease 2019, and its various manifestations 
has done just that.” (cf. OED Blog 2020). 


6.1 Glossary of Coronavirus — a descriptive dictionary 


From the beginning of the pandemic in Croatia (February 2020), lexicographers from 
the Institute of Croatian Language and Linguistics started collecting Coronavirus- 
related words and expressions used in Croatian media, social media, and govern- 
ment briefings. They regularly followed daily newspapers Jutarnji list and Vecernji 
list, TV news, press releases of the Civil Protection Headquarters, and some portals. 
As there was no Croatian corpus that included Coronavirus-related terms, the terms 
were collected manually by seven collaborators. The first version of the Glossary of 
Coronavirus'? was published in the daily newspaper Jutarnji list on March 16, 2020. 
In April 2020, the supplemented version of the Glossary of Coronavirus was posted 
online. In the Glossary, for each term a short definition is given. The phrase: x has 
the same meaning as y connects synonymous terms e.g., terms imunitet krda (‘herd im- 
munity”) and kolektivni imunitet (*collective immunity’). The Glossary included some 
names (and even nick names) e.g., StoZer civilne zastite (‘civil protection headquar- 
ters”), HZJZ, acronym of Hrvatski zavod za javno zdravstvo (“Croatian institute for pub- 
lic health”). The purpose of the Glossary was to meet the needs of Croatian speakers as 
soon as possible. It usually records terms as they are used and does not give any nor- 
mative advice. It includes jargon words as well as scientific terms which entered the 
general language. Table 13 shows selected entries from the Croatian Glossary of Corona- 
virus. As the Glossary of Coronavirus is monolingual we added the English translation. 
In October 2021, the Glossary had 168 terms. For this paper, we continued col- 
lecting new terms in a Google document (so not all terms mentioned in this paper 
have been recorded in the Glossary) and hope that this Glossary will be supple- 
mented in the future. After the information on the Glossary was published on Face- 
book of the Institute of Croatian Language and Linguistics?” (which has 7,700 


19 The initiative for compiling the Glossary came from the principle of the Institute of Croatian 
Language and Linguistics Zeljko Jozié. Terms were selected by Goranka Blagus Bartolec, Lana Hu- 
decek, Kristian Lewis, Ivana Matas Ivankovié, Maja Matijevié, and Milica Mihaljevié and the editors 
of the Glossary were Lana Hudecek, Zeljko Jozié, Kristian Lewis, and Milica Mihaljevié. 

20 https://hr-hr.facebook.com/ihjj.hr. 
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Table 13: Selected entries from the Glossary of Coronavirus. 


Croatian term Definition Synonym English English definition 
term 

karantena prisilno, zakonski ili quarantine forced, legal, or official 
službeno izdvajanje removal of potentially 
potencijalno zaraženih infected persons from the 
osoba iz okoline ili čitavih environment or entire 
naselja kako bi se spriječilo settlements to prevent 
daljnje Širenje virusa; further spreading of the 
potječe od talijanske riječi virus; it comes from the 
quarantina (‘40 dana’). Prvi Italian word quarantina (40 
put kao zaštitna mjera days). It was first applied as 
primijenjena je odlukom a protective measure by the 
Velikoga vijeća Dubrovačke Grand Council of the Republic 
Republike 1377. godine. of Dubrovnik in 1377. 

kirurška navlaka koja se stavljana zaštitna protective cover that is put on the face 

maska lice kao zaštita od maska mask as protection against virus 
prenošenja virusa transmission (coughing, 
(kašljanjem, izdisanjem, exhaling, sneezing, etc.) 
kihanjem i sl.) 

kliconoša osoba ili životinja koja u carrier a person or animal that 
sebi nosi određeni carries a particular 
mikroorganizam koji microorganism that causes 
prouzročuje bolest, ali the disease but has no 
nema znakova bolesti signs of the disease 

kolapsologija u novije vrijeme popularan collapsology in recent times, a popular 
pokret koji upozorava na movement that warns of the 
moguću propast (kolaps) possible collapse of society 
društava te proučava and studies the cause and 
uzroke i posljedice consequences of the 
urušavanja civilizacija collapse of civilization 

kolektivni imunitet koji je steklo imunitet collective immunity acquired by 

imunitet dovoljno ljudi da se bolest krda immunity enough people to stop the 
prestane Siriti disease from spreading 

komorbiditet istodobna pojava dviju ili comorbidity simultaneous occurrence of 
više bolesti two or more diseases 

korona kolokvijalni naziv za Corona colloquial name for 
koronavirusnu bolest, tj. Coronavirus disease, i.e., 
COVID-19 COVID-19 

koronabolnica bolnica predviđena za COVID a hospital designed to 
smještaj i liječenje hospital accommodate and treat 


zaraženih koronavirusom 


people infected by 
Coronavirus 
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followers) we received many comments and suggestions from followers. The infor- 
mation on the Glossary had 11,619 page views; its reach was 9,921 and impression 
736. It had 132 reactions from the users. 


6.2 Croatian Web Dictionary - Mreznik — a normative dictionary 


After the compilation of the descriptive Glossary of Coronavirus, it was decided that 
some of the Coronavirus-related terms should be included in the normative Croatian 
Web Dictionary — Mrežnik, an online, free, corpus-based, monolingual, hypertext, 
searchable, normative dictionary compiled at the Institute of Croatian Language and 
Linguistics.” MreZnik has three modules: for schoolchildren, for adult native speak- 
ers of Croatian, and for non-native speakers (cf. Hudecek/Mihaljevié 2020a). The mi- 
crostructure of MreZnik (module for adult native speakers) is shown in Figure 1. 

The inclusion of Coronavirus-related terms in MreZnik was motivated by the 
questions and comments from the Glossary users, which reflected their need for fur- 
ther explanation and normative guidance. 

The selection of terms was made by the editors according to these criteria: 
more frequent words; 

words which belong to the standard language; 

words with a normative or pragmatic problem connected to their usage; 
Coronavirus-related medical terms used in the general language; 
word-formation clusters; 

the most common jargon words; 

names are not included. 


nauı PWN 


Coronavirus-related terms with compounds and derivatives were added to the Mrežnik 
wordlist, as shown in Table 14. In many cases, these are medical terms that existed in 
Croatian terminology but would not have been included in a general dictionary if their 
frequency in everyday discourse did not increase due to COVID-19. 

The problem for the Croatian dictionary compilers was that most Coronavirus- 
related terms could not be found in general Croatian corpora Croatian Web Corpus 
hrWaC and Croatian Web Repository. 

This meant that the lexicographers had to select examples manually. The web- 
site of the Croatian Web Archive (Hrvatski arhiv weba) has a thematic collection 
COVID-19 in which the lexicographers could check each term and find examples. 

However, in October 2021 a small Coronavirus corpus (called Korona) consisting 
of a little over 280,700 words was compiled by Strkalj Despot and Ostro&ki Anié. 
The MreZnik authors and editors used Korona corpus in editing Coronavirus-related 


21 A demo version of Mrežnik (from A to F) is available online on https: //rjecnik.hr/mreznik/. 
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Figure 1: Microstructure of MreZnik — module for adult native speakers. 
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entries, for supplementing the wordlist (using keywords and N-grams), adding col- 
locations (using word sketches) and examples (using concordance). 


Table 14: A sample of Coronavirus-related terms with compounds and derivatives added to the 
wordlist.?? 


Croatian English Croatian English Croatian English 

epidemija epidemic (noun) dezinfekcija disinfection korona Corona 

epidemiolog epidemiologist dezinficijens disinfectant koronabolnica Corona 

hospital 

epidemiologinja epidemiologist dezinficirati disinfect koronafobija Corona 
(female) phobia 

epidemiologinjin belonging to the koronahumor Corona 
epidemiologist humor 
(female) 

epidemiologov belonging to the koronavirus Coronavirus 


epidemiologist 


epidemiološki epidemic (adjective) 


Table 15 shows the entry COVID-19” with its subentries COVID ambulanta (‘COVID 
infirmary’), COVID bolnica (‘COVID hospital”), COVID infekcija ((COVID infection”), 
COVID odjel ((COVID ward’), COVID ordinacija (‘COVID practice”), COVID pacijent 
(‘COVID patient’), and COVID pozitivan (‘COVID positive’). The terms are illustrated 
by examples and collocations and the etymology and usage of the term COVID-19 is 
explained. 


Table 15: Part of the entry COVID-19 in Mrežnik. 


Section Croatian English 


definition COVID-19 bolest je koju prouzrocuje novi COVID-19 is a disease caused by a new 
soj koronavirusa SARS-CoV-2 (teški akutni strain of the Coronavirus SARS-CoV-2 
respiratorni sindrom, koronavirus 2). (severe acute respiratory syndrome, 
coronavirus 2). 


22 On compiling the wordlist for Mrežnik cf. Hudeček/Mihaljević 2020a. 
23 The entry COVID-19 is available on https://rjecnik.hr/mreznik/index.php/covid-19/. 
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Table 15 (continued) 


Section 


corpus 
example? 


Croatian 


Bolest COVID-19 prenosi se kapljiénim 
putem i izravnim kontaktom, preko 
kapljica sline ili sluzi govorom, disanjem 
ili kašljanjem zaražene osobe u blizini 
druge zdrave osobe. 


English 


COVID-19 disease is transmitted by 
droplets and direct contact, through 
droplets of saliva or mucus by speech, 
breathing, or coughing of an infected 
person near another healthy person. 


collocations 


Kakav je COVID-19? 

dugotrajan, težak 

Što COVID-19 može? 

prijetiti, Širiti se, ubiti, uništiti 
(gospodarstvo, turizam), vladati 

Što se s COVID-om 19 može? 

biti pozitivan na njega, oboljeti od njega; 
preboljeti ga 

U vezi s COVID-om 19 spominje se: 
bolest, borba, kriza, simptom, slučaj 


What is COVID-19 like? longlasting, 
difficult 

What can COVID-19 do? 

threaten, spread, kill, destroy (economy, 
tourism), rule 

What can be done with COVID 19? 

be positive about it, get sick of it; get 
over it 

COVID 19 is mentioned in connection 
with: 

illness, struggle, crisis, symptom, case 


subentries 


COVID ambulanta 
COVID bolnica 
COVID infekcija 
COVID odjel 
COVID ordinacija 
COVID pacijent 
COVID pozitivan 


COVID infirmary 
COVID hospital 
COVID infection 
COVID department 
COVID practice 
COVID patient 
COVID positive 


Entries in Mrežnik contain collocations introduced by collocational questions 
and introductory phrases (cf. Hudeček/Mihaljević 2020c: 78-111). At the beginning 
of the compilation process, as word sketches could not be used for Coronavirus- 
related collocations, the compilers had to find collocations searching the Internet 
and Croatian Web Archive. However, after the appearance of the new Korona cor- 
pus, some new collocations were added to the Coronavirus-related entries. Each 
subentry also has examples and collocations. 

Internal links connect synonyms, antonyms, hyponyms, and feminine/mascu- 
line pairs.” Some of the definitions of Coronavirus-related terms in Mrežnik are 
taken from the Glossary. The terms that are recorded in the Glossary are linked with 
external links. 


24 In Mrežnik, 4-5 corpus examples are given for each meaning of the headword. In Table 13, only 
one example is given for illustration. 

25 Linking is important in Mrežnik. Internal links link one Mrežnik entry to another (synonyms, 
antonyms, male/female pairs, word-formation, etc.). External links link a Mrežnik entry to an exter- 
nal source (cf. Hudeček/Mihaljević 2019), e.g. the Glossary of Coronavirus. 
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The definitions and linking of two partly synonymous entries are shown in 


Table 16. 


Table 16: Definition and links in the entries SARS-CoV-2 and Coronavirus. 


SARS-CoV-2 Coronavirus 
entry Croatian English Croatian English 
section 
definition SARS-CoV-2 SARS-CoV-2 (short Koronavirus jedan je Coronavirus is one 
1 (kratica od engl. for English Severe od sojeva virusa iz of the virus variants 


Severe acute 
respiratory 
syndrome 
coronavirus 2 
(‘teški akutni 
respiratorni 
sindrom 
koronavirus 2”) 
jedan je od sojeva 
virusa iz istoimene 
porodice koji 
prouzročuje 
koronavirusnu 


acute respiratory 
syndrome 
coronavirus 2) is 
one of the virus 
strains from the 
family of the same 
name that causes 
Coronavirus 
disease, i.e. 
COVID-19. 


istoimene porodice 
koji prouzročuje 
koronavirusnu 
bolest, tj. COVID-19. 


from the family of 
the same name that 
causes Coronavirus 
disease, i.e. 
COVID-19. 


bolest, tj. 
COVID-19. 
references koronavirus 1 SARS-CoV-2 
synonym 
definition - samo u množini: only in the plural: 
2 Koronavirusi su Coronaviruses are a 


porodica virusa 
(Coronaviridae) koji 
imaju jednolančani 
DNK i lipidnu 
ovojnicu u obliku 
krune; mogu zaraziti 
ptice i sisavce, a u 
posljednje vrijeme i 
ljude. 


family of viruses 
(Coronaviridae) with 
single-stranded DNA 
and a crown-shaped 
lipid envelope; they 
can infect birds, 
mammals, and 
humans. 


The headword SARS-CoV-2 is marked with the terminological label med. (medical 
term). It has a definition and etymological explanation, followed by examples and a 
link to the synonym koronavirus (‘Coronavirus’), also a headword in Mrežnik. The 
headword koronavirus (‘Coronavirus’) has two meanings, and only the first is linked 
to SARS-CoV-2. 
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Normative and/or pragmatic advice is given in all cases where the user might 
not be sure which word to use or when to use a specific term. Examples of norma- 
tive and pragmatic advice in MreZnik are shown in Table 17. 

Sometimes, new meanings, examples, and collocations have been added to the 
existing entries (which were not Coronavirus-related) e.g., in the entry val (‘wave’) 
collocations drugi val (‘second wave”), treéi val (“third wave”), and Cetvrti val (fourth 
wave’) were added. In the entry balon (‘balloon’), the meaning ‘protective measure 
in which a group of people physically interact or socialize only with each other and 
some examples from the Internet and/or corpus were added. 

Anosmija/anozmija (‘anosmia’) is one of the medical terms the frequency of 
which has increased due to Coronavirus. This term was recorded in two forms anos- 
mija and anozmija. In the Croatian terminological database Struna (cf. Bratanié/Os- 
troSki Anić 2013), the term anozmija was preferred. 

However, after the appearance of Coronavirus and the entrance of this term 
into general discourse it was used in the form anosmija by the media. Therefore it 
was decided to have the term anosmija as the headword in MreZnik and anozmija as 
its synonym.” In the normative note, the reasons for this are explained. The entry 
anosmija in MreZnik is shown in Table 18. 


26 The entry anosmija is available on https: //rjecnik.hr/mreznik/index.php/anosmija/. 
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Table 17: Normative and pragmatic advice. 


Headword 


koronavirus 
(Coronavirus) 


Croatian 


Normative advice 

U stručnoj i široj općoj uporabi u 
hrvatskome su jeziku ustaljeni 
prilagođeni nazivi za viruse. Prema 
pravilu o pisanju stručnih naziva malim 
se početnim slovom pišu svi jednorječni 
nazivi te svi višerječni nazivi osim riječi 
koja je i sama ime ili posvojni pridjev 
izveden od osobnoga imena, npr. ebola, 
zika / virus zika, virus Zapadnoga Nila / 
zapadnonilski virus. U skladu s 
navedenim pravilom piše se i naziv 
virusa čiji je latinski naziv Coronavirus 
koji je uzročnik aktualne epidemije. 
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English 


In scientific and general usage, the 
names of viruses are adapted to the 
Croatian language. According to the 
orthographic rule about writing scientific 
terms, all one word and multi-word 
terms that do not contain a name or 
possessive adjective derived from a 
personal name are written with small 
initial letters e.g., ebola (‘ebola’), zika 
(‘zika’), virus zika (‘zika virus’), virus 
Zapadnoga Nila / zapadnonilski virus 
(‘the West Nile Virus’). Following this 
rule, the term denoting the virus called 
in Latin Coronavirus, which caused the 
pandemic in 2020 and 2021, is spelled 
koronavirus. 


Pragmatic advice 

Često se čuje da je netko obolio od 
koronavirusa. U medijskim se 
izvještajima katkad govori o broju 
zaraženih, katkad o broju oboljelih. Oboje 
je pravilno, ali nije pravilno reći da je 
netko obolio od koronavirusa. Ljudi nisu 
oboljeli od virusa nego su njime zaraženi, 
a od bolesti obolijevaju. Možemo reći da 
je netko zaražen virusom ili da je obolio 
od koronavirusne bolesti ili od bolesti 
COVID-19. 


One often says that someone has been 
sick with Coronavirus. In the media, we 
can sometimes read about the number of 
infected or the number of diseased. Both 
expressions are correct, but it is not 
right to say that someone is diseased 
with Coronavirus. Somebody cannot be 
diseased with a virus only infected by it, 
and diseased with a disease. We can say 
that a virus has infected somebody or 
that somebody is sick with the 
Coronavirus disease or COVID-19. 


epidemija Epidemija označuje naglo širenje neke The epidemic denotes a widespread 

(epidemic) bolesti. Stoga, nije pravilno govoriti o occurrence of an infectious disease. 
pandemiji koronavirusa, nego treba Thus it is not correct to say Coronavirus 
govoriti o epidemiji koronavirusne epidemic. It is right to say an epidemic 
bolesti, epidemiji bolesti COVID-19 iliu of Coronavirus disease or an epidemic of 
razgovornome stilu o epidemiji korone. COVID-19 or in the colloquial style 

epidemic of Corona. 
pandemija Pandemija označuje širenje neke bolesti The pandemic denotes the spreading of 
(pandemic) na više država, kontinenata ili na cijeli disease on more countries, continents, 


svijet. Budući da pandemija označuje 
‘Širenje bolesti’, nije pravilno govoriti o 
pandemiji koronavirusa, nego treba 
govoriti o pandemiji koronavirusne 
bolesti, pandemiji bolesti COVID-19 ili u 
razgovornome stilu o pandemiji korone. 


or the whole world. Thus it is not correct 
to say Coronavirus pandemic. It is right 
to say pandemic of Coronavirus disease 
or pandemic of COVID-19 or in the 
colloquial style pandemic of Corona. 


186 —— Milica Mihaljević, Lana Hudeček, Kristian Lewis 


Table 18: Entry anosmija in MreZnik. 


Entry Croatian English 
section 


definition Anosmija je gubitak njuha, jedan od rjeđih Anosmia is the inability to smell, one of 
simptoma bolesti COVID-19, najčešće se the less common symptoms of COVID-19 
pojavljuje u mladih pacijenata s blagom disease, most often occurring in young 
kliničkom slikom. patients with a mild clinical picture. 


synonym anozmija 


pragmatic U medicinskome nazivlju upotrebljava sei Theterm anosmia is used in medical 


note naziv anozmija te je u bazi Strune anozmija terminology, and the medical term is 
preporuceni medicinski naziv, a anosmija recommended in the Struna database, 
dopuSteni. U medijima se (povezano s and anosmia is an allowed term. The 
pandemijom bolesti COVID-19) term anosmia is used in the media 
upotrebljava naziv anosmija. (associated with the COVID-19 

pandemic). 
external to Struna database 
links to Glossary of Coronavirus 


7 Conclusions 


Coronavirus has influenced many walks of life, from health, medicine, sociology, 
and psychology to education and language. Languages change over time, but in 
2020 and 2021, change had been more rapid than ever. The appearance of Coronavi- 
rus resulted in the formation of many new words and phrases, the evolution of 
meaning of the existing ones, and the passing of terms from the language of science 
to everyday discourse. The media and social media had a crucial role in the creation 
and spreading of these new terms. 

As in many other languages, the beginning of the COVID-19 pandemic in Croatia 
was associated with the appearance of many new words (koronabolnica “Corona hos- 
pital’, supersiritelj “super-spreader”) and phrases (socijalna distanca ‘social distance”) 
as well as additional meanings of the existing ones (balon ‘balloon’, val wave”). Na- 
tive speakers, especially journalists, often asked linguists from the Institute of Croa- 
tian Language and Linguistics for language advice. This was the reason for language 
advice posted on the portals Jezicni savjetnik and Bolje je hrvatski!, for a special 
issue of the journal Hrvatski jezik connected with Coronavirus and e-learning, for the 
compilation of the descriptive Glossary of Coroanvirus, and the inclusion of Corona- 
virus-related neologisms in the Croatian Web Dictionary — MreZnik. In autumn 2021, 
Coronavirus-related terms were not yet included in a general publicly available cor- 
pus of the Croatian language, but a small specialized Korona corpus had been 
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compiled and was available on request. Corona-related neologisms were also not yet 
included in the then most recent Croatian dictionary of neologisms Rjecnik neologi- 
zama u hrvatskome (cf. Muhvié-Dimanovski/Skelin Horvat/Hriberski 2016). 

Some of the new Coronavirus-related words/phrases/meanings still belong only 
to the spoken jargon, and they will probably never enter the Croatian standard lan- 
guage (kovidiot ‘covidiot’). On the other hand, some Coronavirus-related terms al- 
ready belong to the standard language (koronabolnica “Corona hospital”). This new 
terminology presented numerous challenges for linguists and especially for standar- 
dologists and lexicographers. Standardologists were faced with many questions relat- 
ing to orthography, grammar (morphology, syntax, and especially word-formation) 
as well as to the lexis posed by journalists and general language users. 
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Sílvia Barbosa, Susana Duarte Martins 

The neologisms of the COVID-19 pandemic 
in European Portuguese: From media 

to dictionary 


1 Introduction 


Humanity has already been confronted with pandemics in the past, such as the Span- 
ish Flu (1918), the Asian Flu (1957), AIDS (1981), H1N1 (2009), Ebola (2014), and Zika 
Virus (2015), just to name a few. COVID-19 (2020), however, has had a stronger im- 
pact on the lives of people around the world, so that communication about new treat- 
ments, new care, new concerns, new behaviours was necessary and, whenever a 
discovery of this social and clinical reality happened new words and/or neological 
expressions emerged at an extremely fast pace, “simultaneously a manifestation of 
language evolution and the evolution of knowledge” (Lino 2019: 10). 

While adjusting to the COVID-19 pandemic, people around the world started to 
talk about the “new normal” way of life, and they conveyed feelings and thoughts on 
the topic through social networks and traditional communication channels resorting to 
a set of specific linguistic strategies, such as metaphors and neologisms. 

The vocabulary in different domains and in everyday speech was expanded to ac- 
commodate a complex social, cultural, and professional phenomenon of changes. 
Therefore, this new life gave birth to a new language - the “coronaspeak”. 

According to Thorne (2020), the “coronaspeak” has three stages: first, it emerged 
in the way medical aspects were communicated in everyday language; secondly, it oc- 
curred when speakers verbalized the experiences they had undergone and “invented 
their own terms”; finally, this “new” way of speaking emerged in the government and 
authorities” jargon, to ensure that the new rules and policies were understood, and 
that population adopted socially responsible behaviours. 

In this paper, we will focus on the second stage, because we intend to take stock 
of how speakers communicate and verbalize this new way of living, particularly on 
social networks, for example. Alongside, we are interested in the context in which 
the neologism - be it a new word, a new meaning, or a new use — emerged, is used, 
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and understood, through the observation of the occurrence of the new word(s) either 
on social networks or through dissemination texts (press) to confront it with the ones 
that Portuguese digital dictionaries have attested so far. Different criteria regarding 
the insertion of new units, the inclusion date, and the lexicographic description of 
the entries in the dictionaries will be debated. 


2 Neology: theoretical and methodological core 
issues 


Historically, neologisms have been the target of prejudice and stigmatization when 
confronted with the standard language (Boulanger 2010). Although speakers usually 
recognize the units of their language that may be considered as new, the concept of 
“neologism” is debatable. Back in 1976, Rey wondered whether “neologism” is a con- 
cept or simply a pseudo-concept, arguing that “there is obviously no neologism per se, 
but in relation to a set of arbitrarily defined uses” (1976: 17). For this linguist, the con- 
cept of neologism is methodological and pragmatic. 

Given the complexity of the concepts of neology (the process) and neologisms (the 
product), experts have divergent opinions about this topic. Therefore, this phenomenon 
has been examined from different angles over time, such as the studies carried out by 
Guilbert (1975), Rey (1976), Alves (1990), Cabré, Freixa and Solé (2002), Pruvost and Sa- 
blayrolles (2003), or Boulanger (2010). More recently, we highlight the works of Alves 
and Maroneze (2018), Jesus (2018), and Rio-Torto's (2020) analysis of the lexical renova- 
tion in Brazilian and European Portuguese. 

According to Alves and Maroneze (2018: 9), the challenge in defining neologism re- 
sides in the concept of novelty: “New word regarding what or whom?” When we speak 
about novelty, a new concept arises in the scope of lexical innovation: the “novelty 
feeling”, a criterion of psychological nature, that Guilbert (1975: 31) associates with neo- 
logisms to express the way speakers may experience the designation of new concepts. 
On the other hand, Guerrero Ramos (2017) and Lino (2019) defend that the “neological 
feeling” is crucial to identify and delimit a neologism, despite its fluctuating character 
(Sablayrolles 2006). 

Pruvost and Sablayrolles (2003) consider neologisation as a natural process, which 
depends on several factors, such as age, the speakers” experience, and the dynamic of 
different periods, as it is the case of the global pandemic of COVID-19. Following Cor- 
reia and Lemos (2005), we understand neology as (i) the natural ability to renew the 
lexicon of a language by creating and incorporating new units; and (ii) the study (ob- 
servation, collection, description, and analysis) of neologisms that occur in linguistic 
systems. 

Neologisms start by appearing in the speech, and may eventually become fixed in 
the language, thus losing their neologism status. As Guilbert puts it: “the repetition of 
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the act of creation establishes the individual neologism in the lexicon society; creation 
is confirmed by a certain usage. The created term is then lexicalised and, at the same 
time, loses its neologism quality! to become a socially established word” (1975: 49). 
Hence, all the words were once neologisms, as Jean-Claude Boulanger and Bernard 
Quemada defend (Sablayrolles 2016). 

Some authors also mention the distinction between occasionalisms and neolo- 
gisms (Dal and Namer 2016, Bueno Ruiz 2020) based on the stability function of a unit 
in a linguistic system and its permanence time (Dressler 1981). While an occasionalism 
would be temporary and ephemeral, without dictionary attestation, a neologism, from 
a diachronic point of view, is a unit that can be included into a dictionary at a 
given moment, and thus become part of the common lexicon of a language losing its 
neologic status. 

The entry into the linguistic system, made official by the registration in the lan- 
guage dictionary, of permanent and stable formations, resulting from a system need, 
mainly of denominative character, coincides with the moment when those units cease 
to be neologisms, according to Correia (1998). In this sense, Guerrero Ramos (2017: 
1399) claims that a unit is neological if it does not appear in the dictionary, therefore 
“the dictionary remains an effective means of measuring neology”. On the other hand, 
media plays an important role in the identification of neologisms as a vehicle for the 
dissemination of the standard language and is considered a reference model that can 
condition or encourage the use of certain linguistic trends (Freitas, Ramilo, Arim 2010) 
or, as Pruvost and Sablayrolles (2003: 9) state: “Press chronicles, more or less selective 
institutions and dictionaries also play their regulatory role to evaluate, channel, define, 
suggest, sometimes officially impose an adaptation or substitutes for the neologisms 
resulting from the daily turmoil”. 

Given the sophistication of neologisms, experts have come forward with many pro- 
posals to categorize neological units. Sablayrolles (1997) addresses this issue, present- 
ing 12 types of neologisms typologies alongside formation processes. Despite the 
numerous existing typologies, Jesus (2018: 54) concludes that “in general, it is possible 
to synthesize the formation processes in three fundamental types: formal, semantic 
and loan processes”,? and advocates the importance of considering “pragmatic and dis- 
cursive factors as inherent to any neological unit and, consequently, to any typologisa- 
tion proposal”. 


1 In the original text: “qualité de néologisme”. 

2 Formal neologisms are creations based on processes of derivation, composition, formation by acro- 
nyms, reduction of words or even in the creation of innovative roots (Boulanger 1979). In this group, 
Rey (1976) includes some borrowings, along with ex nihilo creations, morphological units, initialisms, 
and acronyms. When a new meaning is given to a form that already exists in the language, the neolo- 
gism is semantic and can be described by different types of novelty: total or partial (Rey 1976). 
When the neological units derive from the adoption of a foreign unit, they are known as borrowed 
neologisms. 
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3 Neology in Portugal in a nutshell 


In Portugal, neologism studies began in the 1980s, at NOVA CLUNL (Linguistics Re- 
search Centre of NOVA University of Lisbon”), with the Observatoire du Francais 
Contemporain de Lisbonne (‘Observatory of Contemporary French in Lisbon’), 
under the supervision of Teresa Lino. Neologisms were collected and processed as 
computerized data in the Base de Neologismos do Português Contemporâneo (‘Con- 
temporary Portuguese Neologisms Base”) (Lino 1988, 2003). Later, the Observa- 
tório do Português Contemporâneo (‘Contemporary Portuguese Observatory”) was 
founded with the aim of creating a bank of Portuguese neologisms, with the same 
methodological principles as the French observatory, followed then by the Observ- 
atório de Neologia e de Terminologia em Língua Portuguesa — Neoporterm (‘Obser- 
vatory of Neology and Terminology in Portuguese Language”). In 2004, the Observatório 
de Neologia - ONP (‘Neology Observatory’) was created by Margarita Correia at ILTEC 
(currently CELGA-ILTEC, ‘Center for the Study of General and Applied Linguistics”), as a 
part of the observatories’ network of the Neologia das Línguas Románicas - NEoRom 
(‘Neology of Romance Languages’), a project coordinated by Teresa Cabré (Correia et al. 
2006). 

Despite several advances in the neology work in Portugal, both projects (at NOVA 
CLUNL and CELGA-ILTEC) are on standby. 


4 European Portuguese dictionaries 


The different evolution stages traversed by dictionaries over time contemplate “an atti- 
tude towards language(s) and a reflection on language(s) itself (themselves): the dictio- 
nary has been a cultural object ever since it was created and has strived to define the 
lexical corpus of a language with a descriptive, didactic, and sometimes normalizing 
perspective”, as Lino (2018: 609) clarifies. The dictionary has an extremely important 
social role in the preparation of an individual for society. It is an example of balance 
between the correct use of language and its variation, whether dialectal or orthographic 
and a guarantor of lexical reliability, without which no norm or rule survives. 

In Portugal, there is no normative language policy, nor an institution with legal 
competence to determine the linguistic norm, despite the existence of the Academia 
das Ciências de Lisboa (‘Lisbon Academy of Sciences’), which provides the Portu- 
guese government with consultancy in linguistic and scientific matters of national in- 
terest. Consequently, there are no normative dictionaries, unlike what happens, for 
example, in Spain, where one can find several scientific academies (Real Academia 
Española, “Royal Spanish Academy”; Institut de Estudis Catalanes, “Institute of Cata- 
lan Studies”; Real Academia Gallega, “Galician Royal Academy”; Euskaltzaindia, 
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“Royal Academy of the Basque Language”) with legal powers to determine the norm 
of languages, and normative dictionaries are available to the public. 

The European Portuguese dictionaries only have a descriptive status and, in some 
cases, a status of reference dictionaries, which allows the regulation of linguistic prod- 
ucts despite not having legal competence to establish the standard. However, the 
speakers acknowledge Portuguese dictionaries as reliable sources of the intended cor- 
rect uses of the language (Correia 2009). 

Usually, paper-based dictionaries have an introduction, a preface and/or a user's 
guide, where questions regarding objectives, methodology, macro and microstruc- 
ture, the insertion of new lexical units, number of entries, among others may be ad- 
dressed. The kind of lexicographic information shared with users, as well as the 
extension of it, depends on the dictionary's scope. Regarding the new entries, and 
since a unit ceases to be neological once it is attested in a dictionary, Correia (1998) 
strongly disagrees with the inclusion of the label “neol.”, often used in the micro- 
structure of paper-based dictionaries to indicate the most recent units, defending 
that it is “theoretically more correct, to choose the date of the first attestation of the 
registered unit” (idem: 62). The Dicionário do Portugués Atual Houaiss — DPAH (2011, 
“Houaiss Dictionary of the Portuguese Current Language”) includes both the dates of 
the entries and the label “neol.”. 

The Portuguese online dictionaries: Dicionário Infopédia da Língua Portuguesa 
(DILP, “Portuguese Language Infopedia Dictionary”) and Dicionário Priberam da Língua 
Portuguesa (DPLP, ‘Priberam Dictionary of the Portuguese Language”) do not explicitly 
mention the objectives of the works, the methodology used, or details about how the 
insertion of new lexical units is done, nor whether there is a temporal register associ- 
ated with new entries. However, the DPLP provides more lexicographic information 
(e.g., number of entries) and has a detailed user guide? when compared with the DILP. 

Unlike what happens with paper-based dictionaries, where the different editions 
serve as a timestamp to identify diachronic changes, it is difficult to verify aspects such 
as the insertion date of a neologism, or the introduction of new meaning through refor- 
mulation or adaptation of definitions in the Portuguese e-dictionaries, given that the 
lexicographic criteria governing the insertion of units are unclear. 

Therefore, in order to find out when a new unit is included in the Portuguese digi- 
tal dictionaries, we have to perform a diachronic analysis of their entries. Taking 
COVID-19 as an example, we have evidence from both dictionaries of the introduction 
of at least one new related word between December 2020 and April 2021. 

In 2020, there were two units associated with the new pandemic in the DILP 
(Figure 1). 


3 Available on: https://dicionario.priberam.org/consultar.aspx. 
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E B covid a] 


covid-19 


COVID-19 


covidiano 


covídico 


^39 99 loan 


Figure 1: Entries associated with COVID-19 in the DILP on 07/12/2020. Source: DILP. 


In 2021, covidário comes up as a new unit in the DILP (Figure 2). 


covid-19 


COVID-19 


covidiano 


covídico 


PALAVRAS RARAS 


PALAVRAS CARAS 


= uo ^99 P E 


Figure 2: Entries associated with COVID-19 in the DILP on 09/04/2021. Source: DILP. 


In 2020, covidário was already an entry of the DPLP, as we can observe in Figure 3. 


priberam 


DICIONÁRIO 

Er Eu 
CovID @ Ver definição ©) Pesquisar nas definições 
COVID-19 
covidário - TOR 

Página principal a 
covidiano 
covídico 


rtuauese 


BaizCZOHWARXOKTESMar-ER 


H LEE lia 


Figure 3: Entries associated with COVID-19 in the DPLP on 07/12/2020. Source: DPLP. 
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In 2021, the DPLP has added covid-drive to its lemma list (Figure 4). 


E [ale] 


COVID o 


Pesquisar nas definicbes 


COVID-19 


gina principal 


covidáno 


covidiano 


covídico 


em Palavras Palavra do dia 
mae 
-—9 nun 


Figure 4: Entries associated with COVID-19 in the DPLP on 09/04/2021. Source: DPLP. 


From the observation of the two dictionaries, at first glance, we could say that the 
DPLP seems to have different criteria from the DILP, allowing a faster insertion of new 
units: the entry covidário was already in the DPLP when the DILP included it in its 
lemma list. 

Despite these constraints, the DILP and DPLP are two digital reference works for 
the contemporary Portuguese language and, therefore, a reliable source of comparison 
between the units attested in the dictionaries and the neological units registered in 
media and social networks, since dictionaries also use them as part of their corpora. 


4.1 Dicionärio Infopedia da Lingua Portuguesa - DILP 


The DILP has both a paper-based and a digital version, the latter one is incorporated in 
the Infopedia website: www.infopedia.pt/. 

In the Infopédia page, we have access to many linguistic resources: monolingual 
dictionaries,* bilingual dictionaries,? a multimedia encyclopedia, a Portuguese ortho- 
graphic vocabulary, a spell converter, translation and spelling games, trivia: “rare 
words, expensive words”, language doubts, and the “word of the year”, allowing users 
to send comments or suggestions about any resource. 


4 Eleven Portuguese language dictionaries: Portuguese language dictionaries with and without spell- 
ing agreement, Portuguese sign language dictionary, dictionary of Portuguese verbs, dictionary of acro- 
nyms and abbreviations, toponymy dictionary, dictionary of proper names (anthroponymy), dictionary 
of medical terms, dictionary of Latin phrases and foreign expressions, basic illustrated dictionary, dic- 
tionary of Portuguese for foreigners. 

5 Nine bilingual dictionaries: Chinese, Dutch, English, French, German, Greek, Italian, Spanish, 
Tetum, and two verb dictionaries of English and French. 
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The DILP does not provide any official information about the total number of en- 
tries, however, the corresponding paper version — Dicionário Moderno da Língua Portu- 
guesa (2008, ‘Modern Dictionary of the Portuguese Language”), belonging to the 
publishing house Porto Editora, records “32,500 entries, phrases and idioms, providing 
more than 2,700 examples and 66,500 definitions"? and, in an announcement regard- 
ing the launch of the digital version of the dictionary in 2007, the publishing 
house informed that the DILP had more than 240,000 definitions available,” so we 
know that at the time the DILP’s lemma list had fewer entries than its paper-based 
counterpart. The Porto Editora commercializes a higher number of paper-based dic- 
tionaries (including more thematic/technical dictionaries) and other products (gram- 
mar books and handbooks) than the ones available on the Infopédia website. 

The microstructure of the DILP comprises phonetic transcription, syllabic divi- 
sion, etymology, grammatical information, usage marks (e.g. colloquial, regionalism, 
taboo, slang, other linguistic varieties of Portuguese, etc.), synonyms, antonyms, re- 
lated words and anagrams, foreign words, idioms, some examples or expressions to 
illustrate contexts. The entries are also attached to information in sign language and 
references to articles from other dictionaries or its encyclopedia. In the Dicionário de 
Portugués para Estrangeiros — DPE (2020, “Dictionary of Portuguese for Foreigners”), 
users can also listen to the words” pronunciation. 

Despite the omission of the total number of the DILP lemmas, as regular users, we 
have the perception of an increase of available lemmas, justified by the creation of new 
dictionaries in recent years, like the DPF. The Infopédia enables access to lots of lin- 
guistic information through numerous resources. Besides that, it is user-friendly and 
has an inviting image. 


4.2 Dicionário Priberam da Língua Portuguesa — DPLP 


The DPLP only has an online version, available through Priberam website: https://dicio 
nario.priberam.org/, along with other resources, like FLiP? and LegiX.? 

In the Priberam webpage, we have access to several linguistic resources: trans- 
lation assistants (English, French, Spanish); verb conjugator (European/Brazilian 
Portuguese and Spanish); spelling agreement converter; syntactic and spell checker 


6 <https://www.portoeditora.pt/produtos/ficha/dicionario-moderno-da-lingua-portuguesa/200124>; 
last access: August 5, 2021. 

7 <https://www.portoeditora.pt/noticias/dicionario-da-lingua-portuguesa-gratuito-na-internet/759>; 
last access: August 8, 2021. 

8 Several linguistic resources for the Portuguese language: https://www.flip.pt/, last access: August 8, 
2021. 

9 Portuguese legal databases, chosen by the largest law firms operating in Portugal. 
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(European/Brazilian Portuguese and Spanish); grammar; vocabulary, with two dis- 
tinct lexical bases for European Portuguese and Brazilian Portuguese. 

The DPLP is a lexicographic product that resulted from the paper dictionary Novo 
Dicionário Lello da Língua Portuguesa (1996-1999, “New Lello Dictionary of the Portu- 
guese Language”). It contains 133,000 entries, “including phrases and phraseologies, 
whose lemma list comprises the general vocabulary and the most common terms of the 
main scientific and technical areas”, according to the introduction of the dictionary 
on its webpage. 

The dictionary allows users to customize it according to the desired Portuguese va- 
riety: European or Brazilian standards, and one can also choose to use the spelling 
agreement (“acordo ortográfico”) version or not,” depending on specific needs. 

After setting the preferences, the DPLP offers users options of autocomplete 
search and spell check, and the entries include information about the morphological 
analysis, search (cross-reference) in the definitions, verb conjugator, related words, 
translation assistants, similar and nearby words (or neighbouring words), the occur- 
rence of the unit in other entries, as well as the real use of the word in blogs, media, 
and Twitter. It also contains data concerning etymology and pronunciation cues, 
grammar information, usage marks (e.g., colloquial, informal, regionalism, slang, 
other linguistic varieties of Portuguese, etc.), synonyms and antonyms, anagrams, 
foreign words, idioms, contexts, between others. 

Having a quick look at the pros and cons of Priberam, all the information displayed 
in the DPLP is free, except for lexicographic resources associated with FLiP. It is a user- 
friendly resource, and there is a tutorial that guides us along the different sections and 
types of searches that the dictionary enables. If we are seeking informal uses of lan- 
guage and contexts of words, the DPLP is a good choice, since it shows us the word in 
examples taken from blogs, media, and Twitter, and allows the user to have more up- 
to-date information regarding contexts of informal uses of language, specifically slang, 
when compared to the DILP. 


10 <https://dicionario.priberam.org/sobre.aspx>; last access: August 5, 2021. 

11 The “new” spelling agreement (also known as spelling agreement of 1990) is mandatory in Portugal, 
Brazil, Cape Verde, Sáo Tomé and Príncipe, but still under discussion in Angola and Mozambique. As 
for Guinea-Bissau, Equatorial Guinea and East-Timor, the governments” priority is that the population 
speak the official language (Portuguese), and with education/linguistic policies to implement the Portu- 
guese language in the country comes the spelling agreement (if communities have access to student 
books, grammars, and other linguistic resources from countries where the spelling agreement is appli- 
cable, or through teachers from those countries). 
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5 Methodology 


In this work, we intend to: (i) verify which lexical units emerged relating to the 
COVID-19 period in the media and social networks, and from these units which 
ones were included into a dictionary at a given moment; and (ii) observe which lexi- 
cal units were attested by the dictionaries in a very short period to meet the users” 
needs. 

The neologisms candidates were extracted from media (newspapers Público and 
Expresso) and social networks (such as Facebook and Twitter) in a period comprised 
between December 2019 — considered as the beginning of the pandemic, and July 2021. 

Considering Sablayrolles statement that “a word enters a dictionary because it is 
no longer neological” and “a word is neological because it is not in the dictionary” 
(2006: 141), we have selected the two digital dictionaries of European Portuguese men- 
tioned above — DILP and DPLP, to confirm if the extracted units are neologisms or not. 

These dictionaries were chosen for the following reasons: they (i) are freely avail- 
able (despite being owned by two publishing houses: Porto Editora and Priberam), 
(ii) possess a considerable lemma list, (iii) are representative of the lexicon of Euro- 
pean Portuguese, (iv) are targeted to the Portuguese and lusophone audience. 


5.1 Criteria for the selection of neologisms candidates 


The process of identifying and collecting neologisms candidates was carried out in two 
stages, manual and semi-manual, and was based on four criteria: diachronic, psycho- 
logical, systematic instability, and lexicographic (Cabré 1993), which helped us to de- 
limit the units considered as neologisms candidates. 

Following Cabré (1993), a unit is neological if it has appeared recently (diachronic 
criterion). On the other hand, as Guilbert (1975) declares, a unit is felt as new at a given 
moment by the speakers of a particular linguistic community compared to the language 
stage immediately before (psychological criterion), whether it is a new orthographic en- 
tity, a new meaning, or an update of meaning. This criterion, dubbed as “novelty char- 
acteristic”, is responsible for the immediate delimitation of a candidate. 

Cabré (1993) refers to the formal instability of the neologism as relevant to its 
classification, a unit will be considered neological if, cumulatively, it shows signs of 
morphological, phonetic, or spelling instability. Different spellings and graphic mark- 
ings, hesitation concerning grammatical gender or pronunciation, for example, re- 
veals insecurity towards the use or existence of specific units in the language. 

Finally, we follow the lexicographic criterion, in which the candidate unit will be 
neological if it is not yet registered in a language dictionary, either at the level of the 
entry or meaning. The inclusion of the unit into a dictionary at a given moment reveals 
that it has lost its neological nature. 
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5.2 Selection of neologisms candidates 


Once the criteria were defined, we have started collecting candidates. As mentioned 
above, since the Portuguese neology observatories are on standby, the solution was to 
endeavour a manual or semi-manual selection. Although time-consuming, it was con- 
sidered indispensable, and only so could we successfully identify cases of semantic 
and formal neology (Correia & Lemos 2005). 

First, whenever a unit “felt” to be neological was found, it was introduced in the 
candidate list with the information considered relevant in the candidate form, as 
shown in Table 1: 


Table 1: Candidates related to álcool and gel. 


candidate N.o context source date 
álcool-gel 0038 A distância das mesas, a separação dos irmãos Newspaper 2020/09/29 
(alcohol-gel) nos recreios, as setas no chão, os frascos de Expresso 


álcool-gel e até o ecrã da televisão com os 
números da covid-19 aparecem desenhados. 


álcool gel 0032 Metro e autocarros cheios de regras e de Newspaper 2020/03/19 
(alcohol gel) lugarers vazios, viajantes mascarados cobertos Expresso 

de respeito, fiscais de poucas multas e de 

muita pedagogia, ou máquinas automáticas 

com chocolates, pastilhas elásticas, máscaras, 

luvas e álcool gel. Eis o retrato dos transportes 

públicos do Porto, onde um autocarro seguia 

sem passageiros para Sonhos. 


álcool em 0019 A LMVH proprietária da Louis Vuitton, começou Newspaper 2020/03/19 
gel a produzir e a distribuir gratuitamente álcool Público 

(alcohol in em gel, em França. A iniciativa procura 

gel) responder ao risco de falta de desinfectantes, 


em todo o país, para proteger a população da 
propagação do novo coronavírus. 


gel 0006 Os receios face ao vírus quadriplicaram as Newspaper 2020/03/04 
desinfe(c)tante vendas de Fevereiro de sabäo e gel Püblico 

(disinfectant desinfectante, incluindo aqueles com base de 

gel) älcool, face ao ano passado, segundo os dados 


mostrados pela gigante do retalho Lotte Mart. 


From the final extraction, we have obtained a list of candidates, of different typolo- 
gies and formation processes: álcool-gel (‘alcohol-gel’), ano pandémico (‘pandemic 
year’), antigo normal (‘old normal’), antimáscaras (‘antimasks’), bolhas domésticas 
(‘domestic bubbles’), centro de vacinação (‘vaccination center”), comportamentos 
de risco (‘risk behaviours’), confinamento (‘lockdown’), desconfinamento (lifting 
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lockdown’), drone pandémico (‘pandemic drone’), escola virtual (‘virtual school’), 
estado de emergéncia (‘emergency state’), fase de mitigacdo (‘mitigation phase’), 
fraudemia (‘fraudemic’), geracáo pandemia (‘pandemic generation”), hidroxicloro- 
quina (*hydroxychloroquine”), imunidade de grupo (‘group immunity”), janelas do 
confinamento (‘lockdown windows”), kit de diagnóstico (‘diagnostic kit’), língua covid 
(‘covid language”), mapa pandémico (‘pandemic map”), negacionista (‘negationist’), 
plano nacional de testagem (‘national testing plan”, quarentena (‘quarantine’), recém- 
vacinados (‘newly vaccinated’), supercontagiadores (‘super contagious’), testes soroló- 
gicos (‘serological tests”), uberizacdo (‘uberization’), vacinódromo (‘vaccinedrome’), 
zaragatoa (‘swab’). 

Subsequently, candidates were lemmatized to facilitate their registration and allow 
a more efficient analysis. For reasons of space and scope, we will only discuss the re- 
sults derived from the selected units: coronavirus, COVID-19, pandemia and the prefix 
tele-. These units were chosen due to the simple identification of the candidate, and 
their associations with the disease, such as different units related to the designation of 
the disease (coronavirus, COVID-19), units used as a metonym for a specific disease 
(pandemia), and a prefix unit (tele-) associated with performing certain tasks in the so- 
called “new normal” or “post-pandemic scenario”. 


6 Analysis 


In this section, we are going to dwell on the lexicographic representation of four 
units related to the pandemic. The selection was based on the most generic units ana- 
lyzed when the subject is the COVID-19 pandemic (coronavirus, COVID-19, pandemic). 
Additionally, the particular interest in understanding the impact of COVID-19 on peo- 
ple’s lives in the technological age justifies our choice of the prefix tele- as a potential 
promoter of neologisms. 

Initially, we confirmed whether the four selected units were included in the 
lemma list of the DILP and DPLP. Subsequently, we observed the microstructure of 
the entries, namely the content of the definitions and the type of words related to the 
units under study in both dictionaries (5.1.). In the final stage of our research, we 
present the neologisms candidates collected from different sources and discuss their 
formation processes (5.2.). 


6.1 Lexicographic description of the units: coronavirus, 
COVID-19, pandemic, and the prefix tele- 


We will set our attention to the four selected examples, compare their definitions in the 
DILP and DPLP and discuss the approach to the same units in both dictionaries. 
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After commenting on the definition of the units under study, a comparative analy- 
sis of the microstructure of both dictionaries was carried out as a result of two types of 
search processes. First, we have performed a search in the lookup window of each dic- 
tionary, where a set of words starting with the same characters (e.g. “coronav”) is dis- 
played (Figure 5). 


— 
P infopédia priberom [== 
h DICIONÁRIOS PORTO EDITORA coronaviral squisar nas definições o 


coronavirologista rincipal 


Língua Portuguesa WV 


8Q | coronavirose 


coronavira NN coronavirus 


coronavirus - 
== Boden S N cororaviso 


Figure 5: Example of a search result with the initial characters of coronavirus in the lemma list of the 
DILP (left) and DPLP (right). 


Then, we have observed the related words within each dictionary entry under study 
(Figure 6). 


co-ro-na-vi-rus coronavirus 


(latim corona, -ae, coroa + virus) 
nome masculino de dois números 
(Biologia, Medicina] Designação dada a vários vírus com ARN como material genético, cuja nome masculino de 2 números 
forma lembra a de uma coroa, que são causa comum de infecções respiratórias leves a eDICINA designação comum, extensiva a qualquer um dos vírus 


coronavirus + kurone'viruf 


moderadas, mas também da pneumonia atípica grave. da família Coronaviridae, capazes de infetar animais e humanos 
Palavras relacionadas] covídico, coronaviral, sindrome, SARS, coronavirologia, causando doenças respiratórias e digestivas (entre as que afetam 


o ser humano, contam-se a COVID-19, a síndrome respiratória do 
Médio Oriente ou a sindrome respiratória aguda grave) e que, 
observados ao microscópio, apresentam uma morfologia 


coronavirologista, covidiano. 


Parecidas característica que recorda a forma de uma coroa 
corona virus coronärios coronárias 
@ De corona-+vírus 
Palavras vizinhas " 
VEJA TAMBEM 
coronavirologia coronavirologista coronavirose coronavirus coronavisor alfacoronavirus, anticovid-19, betacoronavirus, coroa, COVID-19, 


deltacoronavirus, gamacoronavirus, MERS, SARS 


coroncho  corondel 


Figure 6: Example of the entry and respective suggestions of other words in the DILP (left) and DPLP 
(right). 


The lookup window of the DILP displays other words in alphabetical order (up to 10 re- 
sults), and the page of the entry allows users to see a set of related words (‘veja também”, 
“see also”). In the DPLP, the lookup window suggests other words in alphabetical order 
(up to 6 results), while on the page of the entry, we have related words ('palavras rela- 
cionadas’), similar words (“parecidas”), nearby/neighbouring words (‘palavras vizinhas”). 

In a preliminary analysis, we have checked the suggestions in the lookup win- 
dow and compared them with the related words for each of the four entries in both 
dictionaries between April and July 2021. During the analysis, we have discarded 
words that are not semantically related or, for the sake of length, words that show up 
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simultaneously in the two types of searches or if they occur with other units under 
analysis simply because the lemma occurs within the definition of another entry (as 
in the DPLP section “this word in the dictionary”). Ultimately, this diachronic analy- 
sis has not retrieved significant differences in the referred period, so we will only 
present a summary (cf. Tables 2-5) for each unit with a summary of our findings. 


6.1.1 coronavírus 


The lemma coronavírus is included in both dictionaries, and their definitions include 
data about etymology, gender (it is a masculine noun, with no graphic variation nor 
gender instability), number (singular and plural), and it is associated with the domains 
of medicine and biology (Figure 7). 


coronavírus coronavirus 


co.ro.na.vi.rus kurone'viruf 


coronavirus | n. m. 2 núm 


nome masculino de 2 números 


¡EDICINA designação comum, extensiva a qualquer um dos virus 
da família Coronaviridae, capazes de infetar animais e humanos 
causando doenças respiratórias e digestivas (entre as que afetam coronavirus 
o ser humano, contam-se a COVID-19, a sindrome respiratória do 
Médio Oriente ou a síndrome respiratória aguda grave) e que, 
observados ao microscópio, apresentam uma morfologia 
característica que recorda a forma de uma coroa 


@ De corona-+virus 


Figure 7: The entry of the lemma coronavirus in the DILP” (left) and DPLP? (right). 


The definition of coronavirus in both dictionaries stresses the following characteristics 
of the lemma: (i) it is a common designation of a certain family of viruses; (ii) the virus 
causes a set of symptoms; (iii) it has the shape of a crown (“coroa”, “corona” in 
Latin). 

Additionally, the DILP includes encyclopedic information in the definitory text, 
naming different types of coronavirus: COVID-19, MERS-CoV (acronym of Middle East 
respiratory syndrome coronavirus, or in Portuguese: “síndrome respiratória do Médio 


12 DILP definition: ‘common designation, extended to any of the viruses of the Coronaviridae family, 
capable of infecting animals and humans, causing respiratory and digestive diseases (among those that 
affect humans, there are COVID-19, the Middle East respiratory syndrome or the severe acute respiratory 
syndrome) and which, when viewed under a microscope, have a characteristic morphology reminiscent 
of the shape of a crown.”. 

13 DPLP definition: Designation given to several viruses with RNA as a genetic material, whose shape 
resembles a crown, which are a common cause of mild to moderate respiratory infections, but also of 
severe atypical pneumonia.”. 
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Oriente”) and SARS (acronym of Severe Acute Respiratory Syndrome, in Portuguese: 
“síndrome respiratória aguda grave”). The fact that this definition includes the recent 
coronavirus disease (COVID-19), suggests a reformulation or an update of the definition 
to accommodate a new concept. 

Let us have a look at the data retrieved from both dictionaries (Table 2) regarding 
the related and nearby/neighbouring words (cf. Figure 7): 


Table 2: Data retrieved from the DILP and DPLP regarding 
coronavirus related and nearby words. 


14 units retrieved from coronavírus DILP DPLP 
alfacoronavírus V 
betacoronavírus v v 
coroa v v 
corona v v 
coronafobia M x 
coronaföbico V X 
coronaviral v v 
coronavirologia x M 
coronavirologista x M 
coronavirose X v 
coronavírus v v 
coronavisor X v 
deltacoronavírus v v 
gamacoronavírus V v 


Ha 
o 
me 
N 


Total of results 


From the comparison of the units retrieved from coronavírus in both dictionaries, one 
can remark that the DPLP seems to have more coronavírus related entries than the 
DILP. On the other hand, the DPLP includes units that have no definition available — 
coronavirose ('canine and feline coronavirus”) and coronavisor (possibly referring to a 
face shield for corona, even though we are using often “viseira” to designate the same 
device), so if we were to consider words with definitions, we can say that the DPLP and 
DILP are even. 

Finally, we observe that coronavirose, coronavisor, coronavirologia (‘coronavirol- 
ogy’) and coronavirologista (‘coronavirologist’), as well as coronafobia (‘coronaphobia’) 
and coronafóbico (‘coronaphobic’), show traces of instability concerning their fixation 
as entries in both dictionaries. Therefore, we believe that these units are losing their 
neologism status as they are experiencing a process of being included into a dictionary 
at a given moment. 
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6.1.2 COVID-19 


The DILP displays in a single entry COVID-19 and covid-19. On the other hand, the 
DPLP attests two entries for the same concept: COVID and COVID-19, however, the lat- 
ter has no definition attached and users are informed that COVID-19 is not in the 
dictionary and are invited to suggest the “inclusion of the searched word in the dic- 
tionary”. Until very recently, the lemma COVID-19 had a definition in the DPLP, now 
it appears solely in the entry of COVID as a reduction and synonym of COVID-19, 
alongside the observation that it can also be spelt in small caps: covid. 


COVID-19, covid-19 


COVID 


nome feminino COVID|n.f 
MEDICINA doença respiratória causada por um coronavirus (SARS: 

CoV-2), apresenta sintomatologia variável, desde casos 

assintomáticos ou formas de intensidade ligeira (cujos sintomas COVID 


kovid daze'nov(e) 


podem incluir febre, tosse, fadiga ou dores musculares) até 

situações graves (sobretudo em idosos ou pessoas com (redução de COVID-19) 

problemas de saúde preexistentes), que podem evoluir para nome feminino 

cenários de pneumonia, falência de múltiplos órgãos e eventual Doença infecciosa respiratória, causada pelo coronavirus SARS-CoV- 


morte; (inicialmente identificada na China, em 2019, atingiu estatuto de 


jos sintom: m incluir f sse, difi S respirató e 
pandemia em 2620) 2, cujos sintomas podem incluir febre, tosse, dificuldades respiratórias e cansaço, e que, em 


alguns casos, pode progredir para pneumonia ou falha respiratória. = COVID-19 


O Do inglês Corona virus disease 2019, «doença de coronavirus Nota: Também se escreve com minúsculas (covid). É também usado como substantivo 


2019», (ano em que foi identificado o primeiro surto da doença) masculino 


Figure 8: The entry of the lemma COVID-19 in the DILP** (left) and DPLP* (right). 


Looking at both definitions, we notice that the two dictionaries favour the female gen- 
der, despite a note in the DPLP regarding the possibility of the occurrence of the mas- 
culine gender, a tendency also observed in other Latin languages. Besides gender 
instability, this lemma presents spelling variants concerning the use of uppercase or 
lowercase, uppercase being the preferred form in both dictionaries. The lemma is cate- 
gorized as a noun, and it derives from an English acronym, a piece of information dis- 
regarded in the DPLP since the lemma etymology is absent. It is associated with the 
domain of medicine, yet the entry COVID in the DPLP adds that this shorter form of 
COVID-19 is also informal. 

These definitions emphasize the following characteristics about the lemma: (i) 
the year of the outbreak (only in the DILP, as there was an update on the lemma 


14 DIPL definition: “respiratory disease caused by a coronavirus (SARS-CoV-2), which presents vari- 
able symptoms, from asymptomatic cases or forms of mild intensity (whose symptoms may include 
fever, cough, fatigue or muscle pain) to severe situations (especially in the elderly or people with pre- 
existing health problems), which can evolve into scenarios of pneumonia, multiple organ failure and 
eventual death; (initially identified in China in 2019, it has reached the pandemic status in 2020).”. 

15 DPLP definition: “An infectious respiratory disease caused by the SARS-CoV-2 coronavirus whose 
symptoms may include fever, cough, breathing difficulties, and tiredness, and which in some cases 
may progress to pneumonia or respiratory failure.”. 
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definition in the DPLP); (ii) the type of illness: a respiratory (infectious, in the definitory 
text of the DPLP) disease, with symptoms (mild to severe); (iii) the seriousness of the 
illness: respiratory failure (in the DPLP), eventual death (in the DILP); (iv) the cause of 
the disease: SARS-CoV-2 coronavirus; (v) the origin of the disease (China) and its status 
(pandemic status in 2020), only in the DILP. 

Concerning the lexicographic representation of SARS-CoV-2, we verify that, while 
the SARS lemma is included in a dictionary at a given moment, the specific version of 
coronavirus — SARS-CoV-2 is not, despite being included in the definitory text of both 
dictionaries. The fact that SARS-CoV-2 is a medical term, with an acronymic basis, 
more complex in terms of structure and, even, verbalization in European Portuguese, 
may explain the preference for the form COVID-19 or, simply, COVID in current lan- 
guage dictionaries. When we look up SARS-CoV-2 in the Dicionário de Termos Médicos 
(Dictionary of Medical Terms”), the search panel cross-references us immediately to 
COVID-19, the definition in which the term is used. On the other hand, when we look 
up SARS-CoV in the same dictionary, no results are returned, but if we check the entry 
of SARS, we realize that SARS-CoV is part of its definition. Thus, there is a lack of stan- 
dardization in the search results that are returned by the dictionary to its users. 

As for data regarding the related and nearby/neighbouring words (cf. Figure 8), 
we have identified eight units in both dictionaries (Table 3): 


Table 3: Data retrieved from the DILP and DPLP regarding 
COVID-19 related and nearby words. 


8 units retrieved from COVID-19 DILP DPLP 
anticovid X v 
anticovid-19 v X 
COVID X v 
COVID-19 v X 
covidário M M 
covidiano v v 
covídico M M 
covid-drive X v 
Total of results 5 6 


The results show that the lexicographers of the DILP and DPLP took different decisions 
about the treatment of the equivalent units: the DILP includes the entry + a specific 
element (number): COVID-19, anticovid-19, while the DPLP presents the same units 
without that specific element. Also here, the DPLP exhibits slightly higher results than 
the DILP. 
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Despite being treated as synonyms in the DILP and DPLP, covidiano (‘covidian’) 
and covidico (“covidic')'? may present different semantic values in dictionaries and 
social networks. In fact, in social networks, covidiano shows up in contexts where the 
lemma is the result of a phonological game related to the nearby Portuguese unit 
quotidiano (‘daily’). As for covid-drive, one can say that the lemma is losing its novelty 
characteristic since it is already part of the lemma list of one of the two dictionaries. 


6.1.3 pandemia 


The lemma pandemia is part of the DILP’s and DPLP’s lemma list, and the entries dis- 
play information about etymology, gender (it is a feminine noun, showing no graphic 
variation or gender instability), and it is associated with the domain of medicine in 
DILP (Figure 9). 


RE pandemia 
pan.de.mi.a pédo'mie 

indemia |n. f 
nome feminino pa 
MEDICINA 
doença infeciosa que se dissemina a nível mundial; doença que pan-de-mi-a 
ataca ao mesmo tempo um elevado número de pessoas num 


(grego pandemia, -25, o povo inteiro) 


grande número de países 


Com distribuição geográfica intemacional muito alargada e simultânea 


@ Do grego pán, «todo» «démos, «povo» +-ia 
Figure 9: The entry of the lemma pandemia in the DILP" (left) and DPLP* (right). 


The definitions of the pandemia lemma highlight the following characteristics: (i) it is 
an outbreak (DPLP) of an (infectious) disease, (ii) it spreads worldwide, (iii) and affects 
a high number of people (iv) simultaneously. 

The unit pandemia entered the Portuguese language in 1873 (DPAH) and occurs 73 
times in the CETEMPúblico corpus (data from 1991-1998).'? All the occurrences are as- 
sociated with the HIV pandemic, the 1993 Cholera pandemic, and the 1918 Spanish flu 
pandemic. Comparing our research with these data, one may conclude that pandemia 


16 Even though Google retrieves 154,000 and 10,300 occurrences of covidian and covidic (in 15/08/ 
2021), these lemmas are absent from the main online dictionaries of English (Cambridge, Collins, Dictio- 
nary.com, Macmillan, Merriam-Webster, Oxford). 

17 DILP definition: Infectious disease that spreads worldwide; disease that attacks a large number of 
people in a large number of countries at the same time.”. 

18 DPLP definition ‘Outbreak of a disease with a very wide and simultaneous international geographic 
distribution.”. 

19 <https://www.linguateca.pt/CETEMPublico/>; last access: August 1, 2021. 
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is used as a synonym of SARS-CoV-2 (found in 2002), such as coronavírus, or COVID-19. 
On the contrary, despite being discovered in 1965, the lemma coronavírus is absent of 
the DPAH and CETEMPúblico. 

One interesting remark concerning related and nearby/neighbouring words of pan- 
demia is that they differ in the dictionaries under study: in the DILP pandemic is asso- 
ciated with coronavirus, COVID-19, and outbreak (surto), while the DPLP connects it to 
a calamity (calamidade), but also fatigue (fadiga) or tiredness (cansaço) and covidic 
(covídico). Once again, the DPLP presents a few more entries related to pandemia than 
the DILP (Table 4): 


Table 4: Data retrieved from the DILP and DPLP regarding 
pandemic related and nearby words. 


13 units retrieved from pandemia DILP DPLP 
calamidade x v 
cansaço x M 
coronavírus V X 
COVID-19 V X 
covídico x V 
endemia V X 
epidemia M x 
fadiga x M 
pandemia M M 
pandémico V V 
pandemiologia x M 
pandemiológico X V 
surto V X 


N 
00 


Total of results 


The units retrieved from coronavirus — pandemiologia (‘pandemiology’) and pandemio- 
lógico (‘pandemiological’), are not found in the DILP, however, the DPLP integrates 
these units in its lemma list, similarly to what happened with the coronavirus related 
words with the same suffixes (coronavirologia and coronavirologista). Additionally, our 
research showed that even though pandemiologia is not included in the DILP, endemio- 
logia (‘endemiology’) is attested in that dictionary. Given the instability of the insertion 
of these units in the dictionaries, they can also be considered as cases of units losing 
their neologism status. 
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6.1.4 tele- 


Both dictionaries mention that the prefix tele- is a compositional element associated 
with the concept of distance, however, only the DPLP makes explicit that this prefix 
can also be used as a truncated element of television (Figure 10). 

Similarly, Cunha and Cintra (1984) attested these two homonymous composi- 
tional elements in the context of the prefix tele-, both related to distance and televi- 
sion. Since the lemma teledisco (music video) is included in both dictionaries and 
the information regarding its etymology indicates that tele- is a truncation of televi- 
sion (tele[visäo]+disco), one can remark that the definition of this prefix is incom- 
plete in the DILP. 


tele- tele- 


tele- | elem. de comp 
tele |). f. 


tele 


elemento de formacáo de palavras que exprime a ideia de longe, 
ao longe, à distáncia tele- 
(grego téle, à distância) 
elemento de composigáo 


a Do grego téle, «longe» Exprime a noção de distância (ex.: teletrabaino) 
Exprime a noção de televisão (ex.: teledisco) 


Figure 10: The entry of the lemma regarding the prefix -tele in the DILP?? (left) and DPLP?! (right). 


When it comes to the nearby/neighbouring and related words, the DPLP is much more 
prolific than the DILP (Table 5): 


Table 5: Data retrieved from the DILP and DPLP regarding 
tele- related and nearby words. 


14 units retrieved from COVID-19 DILP DPLP 


longe 
teleadministração 
telealarme 
telealuno 
teleautografia 
teleautógrafo 
telecêntrico 
telecomandar 
teledifundir 
teledinâmico 
teledirigir 
teleguiar 
televisual 
telex 


XXX X XX X X X xXx x x x < 
<~<<< << <<< << x 


pà 
ta 
w 


Total of results 
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While the DILP only associates the unit longe (‘far’) to the prefix tele-, the DPLP 
relates it to units that convey the concepts of distance (teleadministracáo, “teleadminis- 
tration’; telealarme, ‘telealarm’; telealuno, ‘telestudent’; teleautografia, ‘telauthogra- 
phy’; teleautógrafo, ‘telautograph’; telecéntrico, ‘telecentric’; telecomandar/teleguiar, 
‘to operate (something) by remote control; to remote-control’; teledinámico, ‘teledy- 
namic’; teledirigir, ‘to control from a long distance’; telex) and television (teledifundir, 
‘to broadcast by television’; televisual; and telealuno, that can also occur in the context 
of the television) following its definition. 

In short, we can conclude that most units related to the chosen examples are in- 
cluded in the lemma list of the DILP and DPLP. As for the few units that only show up 
in one of the dictionaries, we may infer that undisclosed lexicographic reasons are un- 
derlying these decisions, simultaneously demonstrating traces of variability and insta- 
bility in their being included into a dictionary at a given moment, probably connected 
with the loss of neologicity. 


6.2 Neological creativity 


In this section, we will discuss the cases of neologisms collected from various sources, 
such as media (newspapers, magazines) and social networks (Facebook, Twitter), as 
well as their typology, given that “the press and the media, in general, are an important 
gateway not only for common neologisms but also, and even more so, for specialized 
neologisms” (Guerrero Ramos 2017: 1399). 


6.2.1 coronavirus 


Graphic variation concerning coronavirus and not pointed out in the e-dictionaries was 
identified in several sources: Coronavirus, corona virus or the reduction corona, Corona. 
The lemma corona is also attested in the DPLP as a synonym of coronavirus, used infor- 
mally. Although coronavirus is not understood as a full synonym of COVID-19, novo co- 
ronavirus (new coronavirus) was assumed to be synonymous in some contexts. 
Regarding the processes of neological formation, we have identified several cases 
of neologisms candidates created through means of prefixation, suffixation, com- 
pounding, importation of loanwords, as well as syntagmation, as shown below: 
(i) prefixation: pré-corona (‘pre-corona’), pré-coronavirus (‘pre-coronavirus’), pós- 
coronavirus (‘post-coronavirus’), anti-corona (‘anti-corona’); 


20 DILP definition “element of word formation that expresses the idea from far, far, at a distance”. 
21 DPLP definition “1. Expresses the notion of distance (e.g., telecommuting). 2. Expresses the notion 
of television (e.g., music video)”. 
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(ii) suffixation: coronado/a (corona+-ado/a, ‘with corona’, ‘coronated’; corona+lixado/a, 

‘angry because of corona”), coronafobia (‘coronafobia’), coronafóbico (‘coronafobic’); 

(iii) compounding: alfacoronavirus (‘alfacoronavirus’); betacoronavirus (‘betacorona- 
virus’); deltacoronavirus (‘deltacoronavirus’); gamacoronavirus (‘gamacoronavi- 
rus’); corona bónus (‘corona bonus”); coronaditadura (‘corona dictatorship’), 
corona-histeria (‘corona hysteria’), coronatroika, coronavirose (‘canine coronavi- 
rus’), coronavisor (‘face shields against corona’), coronaviral (‘coronaviral’), coro- 
navirologia (‘coronavirology’), coronavirologista (‘coronavirologist’), troikavirus 

(troika+virus); 

(iv) syntagmation (phrasal noun constructions): 

- noun+tadjective (corona): nação corona (‘corona-nation’), imposto corona ('co- 
rona tax”), geracáo corona (‘corona generation”), presidéncia corona (‘corona 
presidency’); 

- noun+preposition (de)+(article)+noun (corona): festa do coronavirus (‘corona- 
virus party’), festas do corona (‘corona parties’), tempos de corona (‘corona 
times’); 

(v) loanwords: Corona bonds, corona bonds, coronabonds, corona-jihad, corona room, 
corona-app, corona party, corona free, coronababies, Darth Corona. 


Corona and coronavirus are the base of units formed by prefixation, although the base 
coronavirus seems less productive in our corpus. On the other hand, corona is catego- 
rized both as a noun and an adjective, and it can present two genders: masculine (o 
coronavirus, given the gender of virus in Portuguese) or feminine (when adjective of a 
feminine noun: a corona geracáo). Graphic variation stands out in the loanwords. 


6.2.2 COVID-19 


COVID-19 is lexicalized as a noun in the European Portuguese dictionaries, as speakers 
lose awareness of its acronymic origin. In addition to the variation mentioned in the 
dictionaries (COVID-19, covid-19 and COVID, covid), other cases of graphic variation 
were found: Covid-19, COVID-19, Covid. 

Given the challenges facing the identification of the formation process of some 
units, in some cases concurrently associated with more than a single process, we 
have assigned the neologisms to the most obvious word-formation process and have 
followed the suggestions of the glossary A covid-19 na lingua (‘The Covid-19 in the 
language’, Ciberdúvidas da Lingua Portuguesa, 2020).” Prefixation, suffixation, 


22 <https://ciberduvidas.iscte-iul.pt/artigos/rubricas/idioma/covid-19-na-lingua/4059>; last access: 
August 16, 2021. 
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initialisms, blending, syntagmation, and loanwords were the most productive among 

our analysis: 

(i) prefixation: anticovid-19 (‘anti-covid-19’), eurocovid (‘euro-covid’), pós-covid (‘post- 
covid’), pós-covidologia (‘post-covidology’), pré-covid (‘pre-covid’); 

(ii) suffixation: covidade (COVID-19 dissemination among sportsmen), covidário (place 
in a health establishment intended for the care and treatment of patients with 
suspected or confirmed infection by COVID-19), covidés (COVID+-és, *COVID-19 
language”), covidiano, covideiro (a person who does not deny the existence of a 
pandemic), covídico (‘covidic’); 

(iii) compounding: covidarte (COVID+arte, ‘covidart’); 

(iv) blending: covidar (COVID+convidar e/ou conversar, ‘to invite and/or chat’), covi- 
dar-se (COVID+infetar-se ou convidar-se, ‘become infected or invite oneself’), covi- 
diário (COVID+diário, ‘covidiary’), covidivórcios (COVID+divórcios, ‘covidivorces’), 
covidizer, covidizerque (COVID+que ouvi dizer (que), ‘that I heard (that)’); 

(v) initialisms: a.C. (antes da COVID-19, ‘before COVID-19’), d.C. (depois da COVID-19, 
“after COVID-19^; 

(vi) syntagmation: 

- noun+tadjective (covid): cães covid (‘covid dogs’), enfermaria covid (‘covid in- 
firmary’), fado covid (‘covid fado”), lingua covid (‘covid tongue’), multa covid 
(‘covid fine’); 

-  noun-preposition (de)+noun (covid): pandemia de covid-19 ((COVID-19 pan- 
demic’), surto de covid-19 ((COVID-19 outbreak’), vítima de covid-19 (‘COVID-19 
victim’); 

-  poun-preposition (de)+article+noun (covid): ditadura da covid-19 (‘COVID- 
19 dictatorship’), transmissdo do covid (‘covid transmission’); 

— noun+preposition (de)+noun+preposition (de)+article+noun (covid): taxa de 
transmissão da covid-19 (‘covid-19 transmission rate”); 

- noun+preposition+article+noun (covid): vacina contra a covid-19 (‘vaccine 
against the covid-19’); 

(vii) loanwords: covidiota (‘covidiot’), long Covid, StayAway Covid. 


The neologisms associated with COVID-19 also show a high rate of graphic variation. 
As in coronavírus, it can either occur as a noun and an adjective and can present two 
genders, even though the masculine gender is not considered standard given that the 
unit is assigned with the gender of disease, a doenca (a feminine gender unit in 
Portuguese). 
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6.2.3 Pandemia 


Pandemic was the source of many cases of lexical creativity. These neologisms were 
created mostly through processes of prefixation, suffixation, parasynthesis, blending, 
and syntagmation, as follows: 
(i) prefixation: antipandémico (‘antipandemic’), pós-pandemia (‘post-pandemic’), pré- 
pandemia (‘pre-pandemic’); 
(ii) suffixation: pandémico (adjective: ‘pandemic’); 
(iii) blending: fraudemia (fraude+pandemia, ‘fraudemic’), infodemia (informacáo+pan- 
demia, ‘infodemic’), páodemia (pão+pandemia, ‘breademic’); 
(iv) parasynthesis: antipandémico (‘antipandemic’), pós-pandémico (‘post-pandemic’), 
infodémico (adjective: ‘infodemic’); 
(v) syntagmation: 
-  noun+adjective (pandemia): pandemia covid-19 (‘covid-19 pandemic’), geração 
pandemia (‘generation pandemic’); 
-  noun-preposition (de)+noun: pandemia de covid-19 (‘covid-19 pandemic’), diá- 
rio de pandemia (‘pandemic diary”), tempo(s) de pandemia (‘pandemic time(s)’); 
-  noun+preposition+article+noun: combate à pandemia (‘fight the pandemic’), 
batalha contra a pandemia (“battle against the pandemic”), pico da pandemia 
(pandemic peak”), propagação da pandemia (‘pandemic spread”), pandemia 
da desinformação (‘pandemic misinformation’), pandemia da pobreza (‘pan- 
demic poverty”), contenção da pandemia (‘pandemic containment’); 
-  noun+preposition+article+adjective+noun: pandemia do novo coronavirus 
(new coronavírus pandemic”); 
—  verb+(article)+noun: controlar (a) pandemia, gerir a pandemia (‘to control/ 
manage the pandemic”); 
-  verb+preposition+article+noun: lutar contra a pandemia (‘fight against the 
pandemic”). 
(vi) loanwords: pand-emmys (by blending of pandemic+emmys). 


Contrary to the previous units, graphic variation and gender instability is not character- 
istic of pandemic related neologisms, but similarly, it has been used as a noun and an 
adjective to form phrasal noun constructions. 


6.2.4 Tele- 


It seems that the prefix tele- was related only with distance during a given time, and 
later, with the advent of the television, the prefix lexical productivity also became 
associated with this device. However, with the COVID-19 pandemic, the creation of 
new units with the semantic value of “distance” took on a prominent role, highlight- 
ing society need for social distancing. It is the case of teletrabalho (‘teleworking’, 
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‘telecommuting’), considered by some as a luxury; burguesia do teletrabalho (‘telework- 
ing bourgeoisie’), regarding the highest-paid people; telemedicina (‘telemedi- 
cine”), specially conceived to remote patients; telejulgamento (‘telecourt’) or 
virtual trials; telescola (‘teleschool’), or telensino (‘teleteaching’), the official 
designation adopted in Madeira Island. The lexical creativity associated with re- 
mote learning is not new. Specifically created for students (telealunos, ‘telestu- 
dents”) who lived in isolated locations or were unable to enroll in a school due to 
lack of vacancies, the telescola operated in Portugal between 1965 and 2003. 
With the arrival of the pandemic, this teaching concept was reactivated and the 
television programme “Study at home” was created. Consequently, an adjust- 
ment in the definition of these units was necessary, given that the telestudents 
can attend lessons not only by means of the television (the only medium available 
in the past) but also through various devices connected to the Internet, 

Neologisms regarding leisure or social activities, like telepraxe (‘telehazing’), have 
also emerged in this context of physical distance, and performing tasks remotely 
through the Internet originated new units, such as teleconsulta (‘teleappointment’), tel- 
emanutenção (‘telemaintenance’), teleconsultoria (‘teleconsulting’). The government 
institutions had to adjust to this new concept of distance, starting to exercise it 
through teleadministração (“teleadministration”) to (try to) maintain a teledemocracia 
(‘teledemocracy’) state. While teleworking, one can still teledizer mal dos colegas (‘tele- 
speak ill of co-workers”) over an online meeting, and if experiencing problems with 
electronic devices purchased at Worten, this Portuguese store can tele-resolver ('tele- 
solve”) clients” technical issues over the phone. 


7 Conclusions 


Individuals and institutions, such as national language academies, are responsi- 
ble for the creation of neologisms, whether in the current language or scientific and 
technical settings. The COVID-19 outbreak encouraged lexical creativity, facilitating 
communication regarding individual and social perceptions towards the new life e- 
xperiences boosted by the pandemic. 

Most of the selected units (coronavírus, COVID-19, pandemia, tele-) and nearby/ 
neighbouring words are attested in the European Portuguese e-dictionaries (DILP, 
DPLP). The units that only appear in one of the works demonstrate the variability of 
the lexicographic criteria of the dictionaries. We have identified entries, considered 
pertinent enough to integrate the dictionaries lemma list, that are waiting for the in- 
clusion of definitions (DPLP) or entries that cross-referenced users to other entries 
where the searched unit occurs (DILP). These situations are a novelty in the lexico- 
graphic setting, given that they would never happen in paper-based dictionaries. An- 
other interesting aspect for reflection is that even while conducting our research, we 
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have detected a few changes in the lemma list (cf. Figures 1-4) and definitory text of 
some of the units under study. The attestation of new units in a preliminary phase (in 
the lemma list or the entry microstructure only including the grammatical category, 
and the information that the definition will be added soon), may be explained by the 
lexicographers’ will to respond to society’s linguistic needs, with sufficient efficiency 
and speed, typical of digital resources that require constant updates. However, dia- 
chronic research would be needed to confirm if the lexicographic description of the 
entries will be completed (DPLP) or if units occurring in attested entries but not yet 
included in a dictionary at a given moment will be added (DILP). 

As a rule, the Portuguese digital dictionaries do not mention the date of the first 
occurrence or insertion of the lemmas, nonetheless, the date may occasionally ap- 
pear within their definitions, as in the case of units concerning particular diseases 
(COVID-19, cf. Figure 8). On the other hand, if one needs to monitor the insertion of 
new units or the reformulation/adaptation of definitions in these dictionaries, virtu- 
ally the only methodology at our reach is restricted to taking screenshots of the 
lookup window and of the entries microstructure to observe the lexicographic repre- 
sentation of certain units at a given time. The analysis of the lexicographic represen- 
tation of the selected units and nearby/neighbouring words led us to the conclusion 
that no objective or obvious criteria are underlying the insertion of new units in the 
European Portuguese e-dictionaries, contrary to what happens in the Brazilian Portu- 
guese (digital) dictionaries, like the Dicionário Caldas Aulete (‘Caldas Aulete Dictio- 
nary”), where entries may be labelled as “new”, “original” or “updated” entry. 

Moving on to the neologisms candidates, it was not an easy task to classify the 
units according to their formation processes as the literature confirms. On the other 
hand, pragmatic and discursive aspects highlighted by Jesus (2018) proved to be 
crucial in the identification of neological units present in the media and social 
networks. 

Our preliminary findings show that coronavírus is regarded as a synonym of 
SARS-CoV-2. Like in other languages, this unit was the object of some graphic insta- 
bility (corona, corona virus); it is frequently mentioned as a new type of coronavirus 
(novo coronovírus); it is used as the first element of compound words (coronadita- 
dura, corona-histeria, coronatroika), also found in the context of loanwords (corona- 
bond, corona room, corona-app); the use of prefixes as a time marker (pre-, post-) 
occurs often (pré-coronavírus, pós-coronavírus), while suffixes like -phobia, -phobic 
convey the fear of the pandemic (coronafobia, coronafóbico). 

As coronavírus, COVID-19 is regularly used as a synonym of SARS-CoV-2. Simi- 
larly, we have observed several processes of variation, such as graphic variation 
(COVID-19, Covid-19, covid) and gender assignment (a/o covid). Phonological neolo- 
gisms, related to wordplays or puns, have been found mainly in the social networks: 
covidizer (“que ouvi dizer”, “that I heard”), covidiano (“quotidiano”, *daily”), covidar 
(“convidar”, “to invite”) or the reflexive covidar-se (COVID+infetar-se, ‘become in- 
fected”). The initialisms a.C. and d.C. (antes/depois da COVID-19) are associated with 
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an era, being equivalents to before/after Christ (antes/depois de Cristo). The phonologi- 
cal adaptation of loanwords resulted in neologisms as covidiota. Prefixation (anti-, 
euro-, pre-, post-) and suffixation (-ade, -ário, -eiro, -és, -iano, -ico) processes were also 
highly productive. 

Pandemia is frequently associated with prefixes (pre-, post-) to delimit a period, 
and other units related to the military semantic field (combate à pandemia, “fighting 
the pandemic”; lutar contra a pandemia, “to fight against the pandemic”; batalha contra 
a pandemia, “battle against the pandemic’), showing that war metaphors concerning 
the COVID-19 pandemic also occurs in Portuguese. Phonological neologisms regarding 
wordplays or puns are frequent as well: fraudemia (‘fraudemic’), infodemia (‘info- 
demic’), páodemia (‘breademic’, a recipe whose designation conveys the idea of the 
homemade bread trend during the pandemic). 

Phrasal noun constructions stood out in all the neologism candidates under 
study. The unit categorized as a noun (coronavirus, COVID-19, pandemia) emerged 
simultaneously as an adjective: noun+adjective (geracdo corona/pandemia, lingua 
covid). Additionally, the structure noun+preposition+noun (tempos de corona/pan- 
demia, pandemia/surto/vítima de covid-19) was often recurrent, admitting alterna- 
tive structures with articles, and other prepositions. 

The prefix tele- conveys the idea of distance and forms words related to the 
use of telephones or television. However, in this new context, we observed that tele- 
expressed generally not only the concept of distance but also physical absence from 
the workplace or other events. The physical distance imposed by the pandemic was 
extended to the online world (internet and other telecommunication means), there- 
fore, tele- is not necessarily related to television as before when speaking about tele- 
trabalho (‘teleworking’), telescola (‘teleschool’), or telepraxe (‘telehazing’). 

We believe that this research demonstrates the vitality of lexical neology pro- 
cesses from our synchronic lexicographic material in the domain of COVID-19 in a 
specific period (December 2019-July 2021). Additionally to contributing to the neol- 
ogy field, this work will also result in the collection of detailed synchronic lexico- 
graphic material from the European Portuguese variety. Only the future will tell 
whether the creative linguistic phenomenon that emerged from the pandemic will 
persist in the Portuguese language (namely the loss of the neologism status of partic- 
ular units while being incorporated in the current language lexicon) or whether it 
will be a source of occasionalisms circumscribed in time and space while the 
COVID-19 outbreak lasts. 
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leda Maria Alves, Beatriz Curti-Contessoto, Lucimara Costa 
COVID-19 terminology and its dissemination 
to a non-specialised public in Brazil 


1 Introduction 


The COVID-19 pandemic has impacted numerous sectors at different levels and has 
imposed a radical change in the pace of life in societies across the globe. Its conse- 
quences can be seen in various social spheres: health systems have been on the 
brink of collapse, the economies of many nations have been almost paralyzed, and 
people have adopted other ways of social contact, principally virtually. In all these 
areas, language has been present. Inevitably, it has also been influenced itself by 
this pandemic context. 

A partially technical vocabulary related to COVID-19 quickly became part of ev- 
eryday life, introduced mainly by news and official bodies. Daily bulletins with 
data on the numbers infected, cured and dead, contagion and vaccination maps, 
information about hygiene care, use of personal protective equipment, social be- 
haviour rules, curfews and possible treatments have been daily transmitted to a 
large part of the world’s population. 

In Brazil, more specifically, in addition to this partially technical vocabulary, it 
has been possible to observe the recurrence of lexical units closely related to politi- 
cal and economic issues. Some of these reflect the stance of denial by the federal 
government in the face of the pandemic, and others concern the attitudes of the 
Brazilian population regarding the ways in which they dealt with restrictions on so- 
cial interaction. 

In order to describe the characteristics of the terminology being disseminated 
in Brazil, the project Study and dissemination of COVID-19 terminology was pro- 
posed, which is being developed under the coordination of Professor leda Maria 
Alves with the support of the Institute of Advanced Studies (IEA) at the University 
of Sáo Paulo. The dissemination of this terminology will be achieved through a dic- 
tionary that is under development. This work is aimed at a Brazilian public, which, 
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in general, has difficulty in understanding this specialised language and, often, has 
problems interpreting the guidelines and information in official health documents 
regarding the prevention, transmission, diagnosis and treatment of the disease. 

Thus, as a contribution towards alleviating the numerous problems caused by 
the pandemic, it is hoped that, through an examination of the syntax and lexicon in 
Portuguese, the present study may serve to minimise the effects caused by the misun- 
derstanding of medical language, contributing to the accessibility and divulgation of 
scientific knowledge by disseminating pandemic terminology to a non-medical audi- 
ence, on an online platform. The study is also a demonstration of how the Human 
Sciences, especially the Language Sciences, can contribute to alleviating the effects 
of this pandemic in Brazil. 

Thus, within the scope of this project, the study reported here aims to detect, 
analyse and discuss the characteristics of COVID-19 terminology, in particular the 
role of the adjective novo [new] in this terminology, the high recurrence of terms in 
the plural and the resemantization of some of the terminological units used. The 
present paper also discusses how these characteristics influenced the choices that 
have guided the creation of the proposed dictionary. This paper presents, therefore, 
the results of the analyses of these aspects, starting with a discussion of the relation 
between terminology and neology and arriving at the characteristic aspects of the 
macrostructural and microstructural choices about which some considerations 
were made. 


2 The constitution of the corpora in the study 


The present study was based on two corpora. The first of these is the Official Corpus 
(OC), which is composed of 993 texts concerning COVID-19 published on the following 
official websites: Organizacáo Mundial da Saúde (OMS), Organizacáo Pan-Americana 
da Saúde (OPAS), Ministério da Saúde do Brasil (MinSaude), Agéncia Nacional de Vigi- 
lância Sanitária (Anvisa), Instituto Butantan,! Fundação Oswaldo Cruz (Fiocruz)? and 
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP). Given the target 
public of the dictionary in preparation, the Journalistic Corpus (JC) was also created 


1 The Butantan Institute is the main immunobiological producer in Brazil and is responsible for a 
large percentage of the production of hyperimmune serums and a large volume of the national pro- 
duction of vaccine antigens, which make up the vaccines used in the PNI (National Immunization 
Program) of the Brazilian Ministry of Health (Instituto Butantan 2021). 

2 Created in 1900 as a pioneering initiative in the country, the Oswaldo Cruz Institute (OCI/Fioc- 
ruz) [. . .] constitutes a complex that generates knowledge, products and services in the biomedical 
area to meet the health needs of the Brazilian population (FioCruz 2021). 

3 Sáo Paulo Research Foundation is one of the main agencies for promoting scientific and techno- 
logical research in the country (Agéncia FAPESP 2021). 
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with the purpose of selecting the most used terms in large circulation press vehicles in 
the Brazilian territory. This corpus, which is complementary, contains 460 texts col- 
lected from the following websites: Folha de S. Paulo (FSP), O Estado de S. Paulo (ESP) 
and O Globo (GLO). 

These corpora were compiled following a methodology based on the web as 
corpus (cf. Kilgarriff 2013). For that, the tools BootCat Bootstrap Corpora and Terms 
from the Web, version 1.21 (Zanchetta et al. 2011), and AntConc (Anthony 2012) were 
used. First, the BootCat served to find the texts available until March 2021 on the 
web regarding the topic in question. Once the corpora were constituted, the com- 
piled texts were treated through the AntConc program. In this program, the obser- 
vation and selection of the lexical units present in the corpora were carried out. To 
do that, different types of lists (keywords, wordlist, clusters and concordance) were 
created. 

These resources were used to identify terminological candidates present in the 
corpora. These candidates were then checked in the light of terminological assump- 
tions. Among the criteria used in this process, there are those presented by Barros 
(2007), which were used in order to verify the degree of lexicalization of terminolog- 
ical phrases and to determine the limits of the syntagmatic terminological units. 


3 Main characteristics related to the constitution 
of COVID-19 terminology 


Rey (1995: 11) states that the history of science and technology has shown that the 
relations between terminology and neology can be found since the first people 
began to name concepts and elements of their environment. The author stresses 
this character of the constitution of terminologies by emphasising that “terminology 
is fundamentally concerned with names and the process of naming” (Rey 1995: 11). 
COVID-19, an infectious disease, is caused by the new coronavirus, which is a 
virus of the “family of Coronaviridae that causes infections in humans and animals 
(e.g., respiratory diseases, gastroenteritis etc.)"^ (Houaiss 2012, online, our transla- 
tion). Since the disease is caused by a previously unknown virus, which is nonethe- 
less part of a family of existing and already known viruses, it has been named novo 
coronavírus [new coronavirus] by the World Health Organization (WHO). This desig- 
nation, released on 30 December, 2019 by the Director-General of the WHO, Tedros 
Adhanom, specifies that, notwithstanding the fact that it is a virus of an existing 
family, this new member has its own characteristics, which correspond to the first 


4 Original: “família dos coronavirídeos, causadores de infecções em seres humanos e em animais 
(p. ex., doenças respiratórias, gastrenterite etc.)”. 
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meaning attributed to the adjective novo [new]: “1 that which was born or appeared 
recently, which has little lifetime, little time of existence (it is said esp. of living 
beings)"? (Houaiss 2012, our translation). 

These new features of novo coronavírus have determined, in the studied termi- 
nology, the creation of several syntagmatic formations made up of the adjective 
novo and its gender and number inflections in Brazilian Portuguese (novos, nova 
(s)), which attribute to these lexical units the characteristics expressed in the afore- 
said meaning. Among these formations, the present paper mentions the most fre- 
quent, mainly used in the plural because they refer, in various contexts, to a large 
number of novas cepas [new strains], novas linhagens [new lines] and novas var- 
iantes [new variants] of the virus. Some examples? extracted from OC and JC are 
presented below: 


(1) Os vírus mudam sempre, e são essas mudanças que levam ao surgimento de novas 
cepas ou variantes, que podem ou não ser mais perigosas. Mais de quatro 
mil mutações foram descritas no Sars-CoV-2 desde o início da pandemia. 
(<JC GLO. 211220») 


(2 A cada mudança do vírus, são geradas também novas linhagens, o que justifica 
a necessidade de novos estudos para uma melhor compreensáo dos fatores 
clínicos e epidemiológicos relacionados à doença. (<OC FIOCRUZ 110221>). 


(3) Há o risco de que novas variantes do virus possam acabar “driblando” a vacina e 
muitos especialistas estimam que com o tempo, seja necessário atualizar a vac- 
ina e reaplicá-la, como ocorre com as vacinas da gripe. (<JC ESP 170121>) 


Louis Guilbert, in two pioneering works on the constitution of terminologies, em- 
phasises the role of classifier or specifier of adjectives, which, employed after the 
substantival core of a syntagma, allow the integration of this syntagma into another 
specialty area. In La formation du vocabulaire de l’aviation (1965a), Guilbert high- 
lights the importance of the adjective aérien in the formation of aviation vocabu- 
lary, which is related to a new type of transport: 


5 Original: “1 que nasceu ou apareceu recentemente, que tem pouco tempo de vida, de existência 
(diz-se esp. de seres vivos)”. 

6 Our translations: (1) Viruses are always changing, and it is these changes that lead to the emer- 
gence of new strains or variants, which may or may not be more dangerous. More than four thou- 
sand mutations have been described in Sars-CoV-2 since the beginning of the pandemic. (2) With 
each change in the virus, new strains are also generated, which justifies the need for further studies 
to better understand the clinical and epidemiological factors related to the disease. (3) There is a 
risk that new variants of the virus could end up “bypassing” the vaccine, and many experts esti- 
mate that, over time, it will be necessary to update the vaccine and reapply it, as with flu vaccines. 
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When the transfer from an old semantic field to a new semantic field takes place in the form of 

an integrating syntagma, the second element, which has an adjectival form, is the main lin- 

guistic instrument of this transfer. The most common among transfer adjectives is aérien.” 
(Guilbert 1965a: 198, our translation) 


Some examples of formations with aérien cited by Guilbert (1965a) are: argonaute 
aérien, navigateurs aériens, route aérienne, voiture aérienne, voyageurs aériens. 

In another work, Le vocabulaire de l'astronautique (1965b), Guilbert studies as- 
tronautics terminology, which was born in the early 1960s. Unlike his study on avi- 
ation, which has a diachronic character, this new study refers to a synchrony from 
1961 to 1963. This work mentions some syntagmatic terms, whose high frequency is 
noted and which are formed with the adjectives cosmique and spatial. This termi- 
nology includes, respectively, biologie cosmique, firmament cosmique and cabine 
spatiale, plate-forme spatiale, among others. In these examples, it is observed that 
the “key adjectives” that indicate that a term is integrated (or is becoming inte- 
grated) into astronautics terminology are cosmique and spatial, which are part of 
several syntagmatic terms in this specialised area. 

Humbley, in his study entitled La néologie terminologique (2018), refers to 
e-commerce terminology (commerce électronique). He affirms that it originated 
from an ancestral domain, or source domain (domaine source), which is the do- 
main of commerce. In this example, the element that enables the transfer between 
domains is the adjective éléctronique. 

Regarding COVID-19 terminology, it is possible to highlight the use of the adjec- 
tive novo, which is quite recurrent, which enables us to consider it as a characteris- 
tic of this terminology related to the novelty of the pandemic. However, what is also 
found is that, in the case of novo coronavirus, the adjective novo loses its qualifier 
character, as occurs in novas cepas, novas linhagens and novas variantes, and starts 
to occupy the role of classifier / typifier, given the fact that what is being formed is 
a proper name from an already existing proper name (cf. Neves 2018). The term co- 
ronavírus, which is part of this formation, was already a name given to a disease 
through the name of a virus. 

In addition to the terminological characteristic regarding the use of the adjec- 
tive novo in several terminological syntagmas found in OC and JC, these corpora 
exposed another interesting recurrence: the constancy of the plural form in several 
substantive terms. This use of the plural occurs because the effects of infection 
caused by the new coronavirus are not singular. Indeed, they can be exacerbated 
by different diseases, including diabetes, hypertension, heart and lung diseases. 


7 No original: “Quand le transfert d'un champ sémantique ancien á un champ sémantique nou- 
veau se réalise sous forme de syntagme d'intégration, le second élément de forme adjectivale est 
l'instrument linguistique principal du transfert. Le plus fréquent parmi les adjectifs de transfert est 
aérien.". 
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The set of these diseases is designated by comorbidades [comorbidities], as exempli- 
fied below: 


(4) A probabilidade de uma pessoa obesa desenvolver a forma grave da Covid-19 
é alta independentemente da idade, do sexo, da etnia e da existéncia de co- 
morbidades como diabetes, hipertensão, doença cardíaca ou pulmonar.® 
(<OC_FAPESP_030920>) 


Consequently, the forms of prevention and treatment of the disease are also multi- 
ple, as are the possible side-effects of the vaccine, and this characteristic is ex- 
pressed by terms used primarily in their plural form. This use is exemplified by the 
terms eventos adversos [adverse events] and equipamentos de protecáo individual 
(EPIs) [personal protective equipment (PPE)]. 

Eventos adversos [adverse events] related to the COVID-19 vaccine are varied, 
with headache, fever, myalgia, diarrhoea, nausea and localised pain being the most 
common: 


(5) Até o dia 04/02/2021, foram notificados ao Ministério da Saúde 7.768 eventos ad- 
versos supostamente associados ás vacinas contra a Covid-19. Desses, 7.686 
foram classificados como eventos não graves e 82 foram classificados como 
graves. Os eventos mais comuns foram cefaleia, febre, mialgia, diarreia, náusea 
e dor localizada.” (<OC MinSaude 210720») 


Equipamentos de protecáo individual (EPIs), including surgical masks, respirators 
and glasses, protect professionals who work with health equipment and patients 
with infectious diseases: 


(6) Usadas em conjunto com outros equipamentos de protecáo individual (EPIs), 
como máscaras cirürgicas, respiradores e/ou óculos, aumentam a protecáo 
oferecida aos profissionais que estão atuando nos equipamentos de saüde.'? 
(<OC FAPESP 060520>) 


8 Our translation: (4) The probability of an obese person developing the severe form of Covid-19 is 
high regardless of age, gender, ethnicity and of the existence of comorbidities such as diabetes, hy- 
pertension, heart or lung disease. 

9 Our translation: (10) Until 4 February, 2021, 7,768 adverse events allegedly associated with Covid- 
19 vaccines were reported to the Ministry of Health. Among them, 7,686 were classified as non- 
serious events and 82 were classified as severe. The most common events were headache, fever, 
myalgia, diarrhoea, nausea and localised pain. 

10 Our translation: (11) Used in conjunction with other personal protective equipment (PPE), such 
as surgical masks, respirators and/or glasses, they increase the protection offered to professionals 
who are working with health equipment. 
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Another characteristic of some terms related to COVID-19 concerns the resemantiza- 
tion process that these units went through. Among these terms, this study mentions, 
as examples, confinamento [confinement], lockdown and quarentena [quarantine]. 
These lexical units were reinterpreted through the extension of their concepts, which 
began to be used specifically in relation to the pandemic. 

By way of illustration, the following excerpts" include the terms confinamento, 
lockdown and quarentena highlighted in bold: 


(7) Se não tiver política económica, monetária e fiscal, vamos caminhar para uma 
depressão. O coronavírus não é uma gripe comum, tem muitas diferenças em 
relação às crises econômicas típicas do livro-texto. É um choque de oferta e, 
em seguida, de demanda, com lockdown (confinamento). (<JC GLO, 210320») 


(8) Todos os pacientes infectados com o novo coronavírus precisam ser hospital- 
izados e colocados em quarentena, a menos que a condição seja grave, é mel- 
hor evitar o ambiente hospitalar; para isolar em casa, é melhor abrir a janela 
para ventilação. («OC FAPESP 040320>). 


Confinamento refers to the “act or effect of imposing (by the authority, the govern- 
ment) a determined residence on an individual, away from social contact”, or to 
“prison isolation”? (Houaiss 2012, our translation). In its turn, lockdown, a borrow- 
ing from English that had already occurred in Brazilian Portuguese, designates the 
“action of isolating people, confining them for a certain time (at home, on a ship, in 
a hospital, etc.), for safety reasons (during pandemics, for example)"? (Houaiss 
2012, our translation) and competes with its translated form, confinamento, within 
the scope of COVID-19. However, these two terminological units were reinterpreted 
in the pandemic context. Although the definition of lockdown has one meaning that 
refers to a type of security measure adopted in pandemics in general, both units 
came to designate, in a more specific way, measures that had to be established by 
the government for the purpose of containing the new coronavirus in society. 
Another terminological unit that reveals this extension of meaning is quarentena. 
It is true that its concept already included semantic features related to: *40-day 


11 Our translations: (7) If there is no economic, monetary and fiscal policy, we are heading for a 
depression. Coronavirus is not a common flu, it has many differences from typical textbook eco- 
nomic crises. It is a clash between supply, followed by demand, and lockdown (confinement). (8) 
All patients infected with the new coronavirus need to be hospitalized and placed in quarantine; 
unless the condition is severe, it is best to avoid the hospital environment; to isolate at home, it is 
better to open the window for ventilation. 

12 Original: *ato ou efeito de impor (a autoridade, o governo) uma residéncia determinada a um 
indivíduo, longe do contato social”, ou a um “isolamento prisional”. 

13 Original: *acáo de isolar pessoas, confinando-as por certo tempo (em casa, num navio, num 
hospital etc.), por medida de seguranca (durante pandemias, p. ex.)". 
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period”, “set of measures and restrictions that specifically consisted in the isolation, 
for a certain time (origin. 42 days), of individuals and goods from regions where epi- 
demics of contagious diseases were raging”, “set of restrictions and/or isolation, for 
variable periods of time, imposed on individuals or loads from countries where epi- 
demics of contagious diseases occur” and “Lent”,'* among others (Houaiss 2012, our 
translation). In none of its meanings, however, is there a close relation with the con- 
text of the COVID-19 pandemic. Therefore, what this term reveals as a characteristic 
marked by the pandemic context is precisely the fact that it refers to one of the govern- 
ment measures to contain this disease that tried to close cities, which forced the popu- 
lation to isolate themselves in their homes and whose duration varied according to 
the occupancy rate of hospital beds. This terminological unit also concerns the isola- 
tion during the incubation period of the new coronavirus practiced by people who 
have had possible contact with patients or have travelled through regions and situa- 
tions of high risk of contagion (through air travel, for example), as well as by patients, 


so that they do not spread the virus. 


4 Characteristics of COVID-19 terminology 
and terminographical implications 


The corpus relating to the terminology of COVID-19, presented above, was created as 
a result of the elaboration of a terminological dictionary aimed at non-specialised 
speakers in the medical field and with little formal education. The aspects of this ter- 
minology that are emphasised in the present paper - the use of the adjective novo, 
the recurrence of the plural form in various substantive and syntagmatic terms, the 
process of resemantization that various lexical units have gone through - guided 
some aspects of the constitution of the dictionary that are highlighted in this section. 

The motivation for the creation of this dictionary arises from the fact that Brazil 
is a country of gigantic dimensions, with a population of 213,797,113 million inhab- 
itants on 11 January, 2021, according to data from the Brazilian Institute of Geogra- 
phy and Statistics (IBGE). This population is highly heterogeneous with regard to 
its educational level, which varies according to several factors, especially geograph- 
ical and social. 

Data from the Functional Literacy Indicator (INAF), released in 2018, indicate that 
29% of the Brazilian population has difficulty in interpreting texts and performing 


14 Original: “período de 40 dias", “hist.med conjunto de medidas e restricóes que consistia esp. 
no isolamento, durante certo tempo (orign. 42 dias), de indivíduos e mercadorias provenientes de 
regiões onde grassavam epidemias de doenças contagiosas”, “infect conjunto de restrições e/ou 
isolamento, por períodos de tempo variáveis, impostos a indivíduos ou cargas procedentes de pa- 


íses em que ocorrem epidemias de doenças contagiosas” e “Quaresma”. 
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simple mathematical operations in their daily activities. The INAF tests are applied to 
Brazilians who are between 15 and 64 years old, with the aim of analysing their skills 
and practices in reading, writing and mathematics aimed at everyday life. According 
to the INAF results, Brazil has 29% of functional illiterates. A functionally illiterate 
person is a literate individual “whose poor literacy leads him / her to write very poorly 
and not be able to interpret what he / she reads"? (Houaiss 2012, our translation). The 
INAF divides the functionally illiterate into two groups: absolute (8%), who cannot 
read words or phrases and telephone numbers, for example, and rudimentary (21%), 
who have difficulty in identifying ironies and sarcasm in short texts, and in perform- 
ing simple operations such as calculating money (INAF 2021). 

Based on these data and the interpretation difficulties manifested by about 29% 
of Brazilians between 15 and 64 years old, the macro- and microstructure of the dic- 
tionary were designed aimed at Brazilian speakers with different levels of educa- 
tion, especially those with little schooling. 

Regarding the macrostructure, the terms will be presented according to the 
number of occurrences verified in the studied corpora. As commented in the previ- 
ous section, these corpora indicate the predominant usage in a plural form in sev- 
eral substantive terms and in several syntagmatic terms, since the effects of the 
infection caused by the new coronavirus can reach several organs, cause different 
diseases and require different forms of treatment. Consequently, the forms of pre- 
vention and treatment of the disease are also multiple, expressed by terms used pri- 
marily in the plural form. This usage was exemplified with the terms comorbidades, 
eventos adversos and equipamentos de proteção individual (EPIs). The dictionary 
will highlight the plurality expressed by these terms by presenting their respective 
entries in the plural form. It is important to say that these terms do not correspond 
to the concept of pluralia tantum, as they also occur in the singular, much less fre- 
quently, within the scope of the studied terminology. In COVID-19 terminology, 
they are most commonly used in the plural because they refer to the multiple im- 
pacts of the virus. 

Another characteristic that influenced the selection of syntagmatic terms that 
make up the dictionary's nomenclature concerns the formations in which the adjec- 
tive novo appears. It is noted that this adjective attributes not only the quality of 
being new, but, above all, it brings a specificity to the concept designated by the 
simple term with which it is associated. In this case, the syntagmatic term novo co- 
ronavírus is an example. Thus, novo coronavírus, which, as explained above, desig- 
nates another type of virus, different from the other coronaviruses that make up the 
family of Coronaviridae, appears as a terminological entry. 


15 Original: “cuja alfabetizacáo precária o leva a escrever muito mal e a náo conseguir interpretar 
o que lê”. 
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The presentation of the terms and their respective entries will follow an onoma- 
siological organization, according to the categories established by the DeCS/MeSH 
(Health Sciences Descriptors 2021), released by the Pan American Health Organiza- 
tion (PAHO). According to the classification established by this organization, the 
terms will be separated into categories, such as: causative agent, anatomy, preven- 
tion, diagnosis, disease, treatment, equipment, among others. 

As the dictionary is intended for a broad and non-specialised public in the med- 
ical field, it will follow the principles of an international tendency towards using 
simplified language, more easily understood by users who are not specialised in the 
area in question. Due to these principles, the definitions and explanatory or com- 
plementary notes, in relation to the term, are being written in plain language, 
which designates texts understandable by different types of speakers. 

Named plain language in English, this type of communication, in Brazil, has 
been designated linguagem simples [simple languagel, linguagem clara [clear lan- 
guage], linguagem cidadã [citizen language], acessibilidade textual e terminológica!* 
[textual and terminological accessibility, inteligibilidade" [intelligibility], but the 
first term is predominant in the country: “a communication is in plain language if 
its wording, structure, and design are so clear that the intended audience can easily 
find what they need, understand what they find, and use that information”, accord- 
ing to the Plain Language Network (Plain Language 2021), an international associa- 
tion for plain language supporters and practitioners around the world, which 
includes members from 30 countries and at least 15 languages. 

In Brazil, some government initiatives are adopting and disseminating this type 
of language. The Rede Linguagem Simples [Simple Language Network], for example, 
was created by the federal government (Empresa Brasil de Comunicação — EBC), 
which is “a space for debate and fostering the construction of initiatives that pro- 
mote the use of simple language”*? (Agência Brasil 2021, our translation). 

In 2016, in the State of Sáo Paulo, the state government launched a manual 
called Orientacóes para adocáo de linguagem clara [Guidelines for adopting clear 
language], which 


should be understood as a guide for the elaboration of a Clear Language (CL), to be used in the way 
the Sáo Paulo State government disseminates its information on the Internet, especially with regard 
to the meaning of technical expressions routinely used by specialists of the various areas of govern- 
ment action, in order to make them more accessible to the understanding of the common citizen.'” 
(Governo do Estado de Sáo Paulo 2016, our translation) 


16 cf. Cortina Silva et al. (2021). 

17 cf. Carvalho and Rebechi (2021). 

18 Original: “um espaco de debate e de fomento para construcáo de iniciativas que promovam o 
uso da linguagem simples”. 

19 Original: “deve ser entendido como um roteiro para elaboracáo de uma Linguagem Clara (LC), 
para ser acoplada á maneira como o governo do Estado de Sáo Paulo divulga suas informacóes na 
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In 2019, the City Hall of Sáo Paulo launched the Programa Municipal de Linguagem 
Simples [Municipal Simple Language Program], with the release of a booklet, called 
Princípios de uma Linguagem Cidadã e Manual de boas práticas de redação da Carta 
de Serviços da Prefeitura de São Paulo de Linguagem Cidadã [Principles of a Citizen's 
Language and Manual of good practices for writing the Charter of Services of the City 
Hall of São Paulo for Citizen's Language], in order to 


help PMSP servants to write texts for the population in a clear, inclusive and understandable 
way for people of all genders, classes and educational levels, discarding the use of bureau- 
cratic and formal language used in public offices, which is also often used to address São 
Paulo society. (Prefeitura de Sáo Paulo 2021) 


The concept of simple language does not imply the use of simplistic or informal lan- 
guage or the elimination of information elements. It seeks to use an understandable 
language with clear information, avoiding problems frequently reported by readers 
due to the use of long sentences, passive verbs, acronyms, abbreviations, little used 
adjectives and unexplained terms. Preference should be given to common words, 
better known by users and to the usual syntactic order of the language (Cortina 
Silva et al. 2021). Taking into account the precepts of simple language, the defini- 
tions in the dictionary are being written, preferably, with one syntactic period, 
since, in most cases, it is possible to elaborate a definition consisting of a single 
sentence. 

Each of these definitions uses, primarily, the classic binary categorization close 
genre + specific differences. The close genre has the function of initial descriptor of 
the definitions and rescues the conceptual content of its hyperonym (a more generic 
term in relation to other terms) and, therefore, of the general characteristics of the 
term, thus expressing the general category or class to which this element belongs. 
The specific differences present the particularities that distinguish the term from 
others of the same class. 

Other definitions, such as extensional ones, in which elements that characterise 
the term are enumerated, are also being used in cases where it is necessary to enu- 
merate several features. An example of this type of definition corresponds to the term 
medidas preventivas: medidas preventivas consistem em lavar as máos, usar máscara, 
usar álcool em gel, evitar aglomerações [preventive measures consist of washing 
hands, using a mask, using alcohol gel, avoiding agglomerations]. 


Internet, em especial no que diz respeito ao significado de expressões técnicas utilizadas rotineira- 
mente por especialistas das diversas áreas de atuação governamental, de modo a torná-las mais 
acessíveis à compreensão do cidadão comum”. 

20 Original: “auxiliar servidoras e servidores da PMSP a redigirem textos destinados à população 
de forma clara, inclusiva e compreensível a pessoas de todos os gêneros, classes e níveis de instru- 
ção, descartando o uso da linguagem burocrática e formal utilizada nas repartições públicas, e que 
muitas vezes também é usada para se direcionar à sociedade paulistana”. 
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By way of illustration, two examples of definitions are presented, accompanied 
by their respective entries (with English equivalent, context of use, explanatory 
note). These terms were extracted from the causative agent category (carga viral)” 
and from the diagnostic category (comorbidades):? 


Carga viral s.f. Quantidade de vírus encontrada em amostras de sangue ou em outros fluidos 
da pessoa infectada. 
Ing. viral load 


Uma carga viral mais elevada foi observada mais frequentemente nos pacientes do sexo mas- 
culino e nos mais idosos. Febre e atralgia (dor nas articulacóes) foram os sintomas mais asso- 
ciados a uma carga viral elevada. (<CO_FAPESP_290421>) 


Nota: O termo fluido designa uma substáncia que corre como um líquido, a exemplo de sangue. 
Comorbidades s.f.pl. 


Duas ou mais doencas presentes em uma pessoa, como diabetes, doenca cardíaca ou pulmonar. 
Ing. comorbidities 


Identificar grupos de maior risco para adoecimento, agravamento e óbito: Idosos; Pessoas com 
comorbidades: Diabetes, HAS, Doenças cardíacas/cerebrovasculares, DPOC, Renal, Obesidade, 
Câncer, Transplantados, Anemia Falciforme («CO MinSaude 271120») 


Nota: O termo comorbidade é usualmente empregado no plural porque se refere a pelo menos 
duas doenças. 


The examples presented above show, respectively, carga viral and comorbidades as 
terminological entries. In carga viral, an explanatory note was presented to eluci- 
date the meaning of the term fluido [fluid], as it is little known to the general 
public. 

The fact that the term comorbidades was registered in its plural form reveals 
one of the aspects observed in relation to the constitution of COVID-19 terminology, 
which was explored in the previous section. This aspect is not only reflected in the 
expression of this term, but also in its definition, since the idea of plurality, as a 


21 Our translation: Viral load f.n. Amount of virus found in blood samples or other fluids from the 
infected person. Eng. viral load. A higher viral load was seen more frequently in male and older 
patients. Fever and arthralgia (joint pain) were the symptoms most associated with a high viral 
load. («OC FAPESP 290421»). Note: The term fluid designates a substance that flows like a liquid, 
such as blood. 

22 Our translation: Comorbidities f.n. in pl. Two or more diseases present in a person, such as dia- 
betes, heart disease or lung disease. Eng. comorbidities. To identify groups at higher risk for illness, 
aggravation and death: Elderly; People with comorbidities: Diabetes, SAH, Heart/Cerebrovascular 
Diseases, COPD, Kidney disease, Obesity, Cancer, Transplants, Sickle Cell Anemia. («OC Min- 
Saude 271120»). Note: The term comorbidity is usually used in the plural because it refers to at 
least two diseases. 
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set, is incorporated within it. The preference of using it in the plural, instead of its 
singular form, is explained by the note in this entry. 


5 Final considerations 


The aim of the present study was to detect, analyse and discuss the characteristics of 
COVID-19 terminology, in particular the role of the adjective novo in this terminology, 
the high recurrence of terms in the plural and the resemantization of some of the ter- 
minological units used. It also sought to fulfil the objective of specifying how these 
terminological characteristics are reflected in the constitution of a COVID-19 dictio- 
nary, which is under preparation. 

Regarding the use of the adjective novo, which proved to be quite recurrent in 
this terminology, it was found that, when added to the term coronavírus, this ele- 
ment assumes the function of being a classifier — which does not generally occur 
with this particular adjective, especially when placed before the noun that it deter- 
mines. In other terminologies, such as those studied by Guilbert and Humbley, 
mentioned before, classifier adjectives have another nature, less common, and are 
directly related to a specialty domain that is formative. 

The observation of corpora also revealed a high productivity of lexical units em- 
ployed in their plural form. Through the examples mentioned in the present paper, 
it is noted that this plurality builds the concepts designated by the studied terms, 
as they mark, in their expression, which comorbidities are in focus in the pandemic 
context in which we live, the side effects associated with the COVID-19 vaccine and 
the personal protective equipment (PPE) necessary for professionals on the front 
lines of the fight against the pandemic. In these cases, the use of the plural indi- 
cates a very particular characteristic of the COVID-19 terminology: that the idea of a 
set, of the collective, is essential to the concepts to which these units refer, in addi- 
tion to the fact of relating, from a semantic-conceptual point of view, these terms to 
the pandemic. 

It was also found that some terms went through a process of resemantization, 
thanks to the fact that lexical units like confinamento, lockdown and quarentena, 
among others, had their meaning expanded to reflect conceptual information re- 
lated to the context of the pandemic. 

These characteristics influenced the choices that guided the creation of the pro- 
posed dictionary, which is a terminological dictionary aimed at non-specialised 
readers in the medical field with little formal education. 


234 —— leda Maria Alves, Beatriz Curti-Contessoto, Lucimara Costa 


Bibliography 


Agência Brasil (2021): Rede quer facilitar linguagem de serviços à população. 
[https://agenciabrasil.ebc.com.br/geral/noticia/2021-03/rede-quer-facilitar-linguagem-de- 
servicos-populacao, last access: 18 August, 2021]. 

Agéncia FAPESP (2021): Sáo Caetano do Sul investe na atencáo primária para enfrentar a 
pandemia. [https://agencia.fapesp.br/sao-caetano-do-sul-investe-na-atencao-primaria-para- 
enfrentar-a-pandemia/33604/, last access: 10 August, 2021]. 

ANVISA (2021): Agência Nacional de Vigilância Sanitária. [https://www.gov.br/anvisa/pt-br>, last 
access: 10 August, 2021]. 

Anthony, Laurence (2012): AntConc (Version 3.5.8) [Windows]. Tokyo, Japan: Waseda University. 
[http://www.laurenceanthony.net/software/antconc/, last access: 14 July, 2020]. 

Barros, Lídia Almeida (2007): Conhecimentos de Terminologia geral para a prática tradutória. São 
José do Rio Preto, SP: NovaGraf. 

Carvalho, Yiuli S./Rebechi, Rozane (2021): Inteligibilidade e convencionalidade em textos de 
divulgação da área médica em português brasileiro. In: Rev. Estud. Ling., Belo Horizonte, 
29(2), 959-998. 

Cortina Silva, Asafi F./Delgado, Heloísa O. K./Finatto, Maria J. B. (2021): Acessibilidade textual e 
terminológica para o portugués brasileiro: pesquisa, estratégias e orientações de [relescrita. 
In: Revista Moara, 58, 322-343 

FioCruz (2021): Fundação Oswaldo Cruz. [https://portal.fiocruz.br, last access: 

20 September 2021]. 

Governo do Estado de São Paulo (2016): Orientações para adoção de linguagem clara. 
[http://www.governoaberto.sp.gov.br/wp-content/uploads/2017/12/orientacoes para. 
adocao linguagem clara ptBR.pdf, last access: 20 September 2021]. 

Guilbert, Louis (1965a): La formation du vocabulaire de l’aviation (1861-1891). Paris: Larousse. 

Guilbert, Louis (1965b): Le vocabulaire de l'astronautique. Paris: Publications de l'Université de 
Rouen. 

Health Sciences Descriptors (2021): DeCS - Descritores em Ciência da Saúde. [https://decs.bvsa 
lud.org/, last access: 20 August, 2021]. 

Houaiss, António (2012): Grande dicionário Houaiss. Rio de Janeiro: Instituto António Houaiss. 
[https:/ /houaiss.uol.com.br/corporativo/apps/uol www/v5-4/html/index.php&4, last access: 
8 November, 2021]. 

Humbley, John (2018): La néologie terminologique. Limoges: Lambert Lucas. 

IBGE (2021): Instituto Brasileiro de Geografia e Estatística. [https:/ /www.ibge.gov.br, last access: 
30 October, 2021]. 

Instituto Butantan (2021): A serviço da vida. [https://butantan.gov.br, last access: 20 October, 
2021]. 

INAF (2021): Indicador de Alfabetismo Funcional. [https:/ /alfabetismofuncional.org.br, last access: 
14 October, 2021]. 

Journal Folha de S. Paulo (2021): Folha. [https:/ /www.folha.uol.com.br, last access: 14 October, 
2021]. 

Journal O Estado de S. Paulo (2021): Estadáo. [https:/ /www.estadao.com.br, last access 
14 October, 2021]. 

Journal O Globo (2021): O Globo. [https://oglobo.globo.com, last access: 10 September, 2021]. 

Kilgarriff, Adam/Rigau, Irene (2013): EsTenTen, a vast web corpus of Peninsular and American 
Spanish. In: International Conference on Corpus Linguistics (CILC2013). Alicante, Spain, 12-19. 
DOI: https://doi.org/10.1016/j.sbspro.2013.10.617. 


COVID-19 terminology and its dissemination to a non-specialised public in Brazil — 235 


PAHO (2021): Pan American Health Organization. [https://www.paho.org/en, last access: 
15 September, 2021]. 

Plain Language (2021): What is plain language? [https://plainlanguagenetwork.org/plain-language 
/what-is-plain-language/, last access: 14 August, 2021]. 

Prefeitura de São Paulo (2021): Cartilha Princípios de uma linguagem cidadã e manual de boas 
práticas de redação da Carta de Serviços da Prefeitura de São Paulo. [https://www.prefeitura. 
sp.gov.br/cidade/secretarias/upload/chamadas/3 colocado grupo 1539290157.pdf, last 
access: 14 October, 2021]. 

Ministério da Saúde (2021): Ministério da Saúde. [https://https://www.gov.br/saude/pt-br, last 
access: 10 September, 2021]. 

Neves, Maria Helena de Moura. (2018): A gramática do português revelada em textos. 1ed. São 
Paulo: Ed. Unesp. 

Rey, Alain (1995): La terminologie. Noms et notions. Paris: PUF. 

Tcacenco, Lucas/Rodrigues da Silva, Bruna/Finatto,Maria ). B. (2020): Acessibilidade textual e 
terminológica. In: Revista GTLex, 3(2), p. 197-224. 

WHO (2021): World Health Organization. [https://www.who.int, last access: 20 October, 2021]. 

Zanchetta, Eros/Baroni, Marco/Bernardini, Silvia (2011): Corpora for the masses: the BootCaT 
front-end. Proceedings of the Corpus Linguistics Conference 2011. Birmingham: University of 
Birmingham. 


Rute Costa, Margarida Ramos, Ana Salgado, Sara Carvalho, 
Bruno Almeida, Raquel Silva 


Neoterm or neologism? A closer look 
at the determinologisation process 


1 Introduction 


This paper arises within the current communication urgency experienced through- 
out the pandemic. From its onset, several new lexical units have permeated the over- 
all media discourse, as well as social media and other channels. These units convey 
information to the public regarding the “severe acute respiratory syndrome” namely 
COVID-19.' In addition to its worldwide impact healthwise, the pandemic generates 
noteworthy influence in the linguistic landscape, and as a result, a significant num- 
ber of neologisms have emerged. Within the scope of our ongoing research, we iden- 
tify the neologisms in European Portuguese that are related to the term COVID-19 via 
form or meaning. However, not all the new lexical units identified in our corpus con- 
taining COVID-19 in its formation can unequivocally be regarded as neoterms (termi- 
nological neologisms). Accordingly, this article aims not only to reflect on the 
distinction between neologism and neoterm but also to explore the determinologisa- 
tion process that several of these new lexical units experience. 

Following the introduction, this paper is divided into 9 sections. In section 2, 
we begin by making a brief theoretical reflection concerning neological processes 


1 In this paper, the term COVID-19 is the preferred form referring to “acute respiratory syndrome”. 
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and determinologisation. Then, in section 3, we describe the method used to compile 
the corpus, the CoronaCorpus, which unfolds in 2 sub-corpora: the PressCoronaCor- 
pus and the LSPCoronaCorpus. The PressCoronaCorpus is composed of texts pub- 
lished in the Portuguese media between November 2020 and July 2021. This corpus 
has been processed via Sketch Engine,? with the purpose of identifying neological 
lexical constructions occurring in non-specialised communication related to the 
emergence of the pandemic. Among such neological constructions, both neologisms 
and neoterms were identified. The latter are defined as terms that are “specifically 
coined for a given general concept” (ISO 1087:2019, 83.4.12). The second sub-corpus, 
the LSPCoronaCorpus, is composed of official documents produced by healthcare 
agencies, professionals and scientists. In the context of this research, this corpus 
plays the role of a reference corpus. 

In section 4, our corpus is explored by means of simple and advanced queries for 
extracting the spelling variants of COVID-19. Section 5, on the other hand, is focussed 
on the COVID-19 acronym, its behaviour in discourse and the re-categorisation of 
covid- as a formative in Portuguese. In section 6, we then proceed with the analysis 
of morphosyntactic and semantic formation of the neologisms and neoterms identi- 
fied in the PressCoronaCorpus, to better grasp the process underpinning neology, in 
what concerns both form and meaning. Furthermore, this section aims to describe 
some of the behaviours depicted by the new elements containing the form COVID-19 
and which occur in non-specialised communication contexts. The migration of terms 
from specialised to non-specialised contexts points towards a shift in status, from 
term to non-term. Such change, in some cases, results from determinologisation pro- 
cesses which are analysed further. 

Next, in section 6, we analyse the lexical units and terms found in our corpus 
and describe the respective neological and determinologisation processes. 

In section 7, we focus on the lexicographic treatment of four neologisms that 
have been registered in Portuguese e-dictionaries available in Portugal, namely the 
Dicionário Priberam da Lingua Portuguesa (DPLP)? and the Dicionário da Língua Por- 
tuguesa of Porto Editora (DLP).* 

Finally, based on our corpus analysis workflow, as well as on the systematic 
comparison of the aforementioned dictionary entries, a template for a lexicographic 
article targeted at neologisms is put forward in section 8, illustrated by the entry 
covid. This proposal aims, on the one hand, to address the detected inconsistencies 
in lexicographic representation in the cited Portuguese resources and, on the other 
hand, to respond to this form’s behaviour in the corpus, namely as an element used 
to create new words (e.g. covid + -ário). 


2 https: //www.sketchengine.eu/ (last access: 10 June 2022). 
3 https://dicionario.priberam.org/ (last access: 10 June 2022). 
4 https: //www.infopedia.pt/dicionarios/lingua-portuguesa (last access: 10 June 2022). 
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Overall, the inclusion of neoterms in dictionaries entails several challenges, 
such as their morphosyntactic classifications, their definition, and which domain 
label they should or should not be assigned. 


2 Neological processes and determinologisation 
2.1 Neological processes 


In the case of a pandemic, there is an enhancement of neological processes, which 
emerge more or less spontaneously to quickly resolve communication issues associ- 
ated with scientific phenomena, which go beyond the understanding of the non- 
specialised public. New words resulting from these processes are considered to be 
neologisms. Within non-specialised communication, neologisms may arise from the 
need to have a communicative impact on the overall community when referring to 
previously non-existent realities and may even stem from highly specialised con- 
texts. On the other hand, there are neoterms, which designate new specialised con- 
cepts produced in a given domain of knowledge. Contrary to a neologism, which is 
formed spontaneously in relation to communication issues, a neoterm is often 
formed consciously to designate a concept and distinguish it from others in the con- 
cept system to which it belongs, so that it can be used in specialised discourse with 
a low degree of ambiguity. 

Both neologisms and neoterms are linguistic phenomena that are morphosyn- 
tactically manifested via the creation of new lexical units, or, semantically, via the 
attribution of new meanings to already existing lexical units. Neologisms can, 
therefore, be analysed according to different perspectives. In line with what has 
been stated by Lino, “neologisms are simultaneously a manifestation of the evolu- 
tion of a language and the evolution of knowledge, both of which happening at an 
extremely quick pace” (2019: 10). Terminologists and lexicologists look at the phe- 
nomenon of neology from a different standpoint. In terminology science, a neolo- 
gism is defined as a “term that is specifically coined for a given general concept” 
(ISO 1087:2019, 83.4.12), whereas in general language, a neologism is defined as “a 
new word” or “a new meaning of an existing word in the language” (Pruvost and 
Sablayrolles 2003). The difference is quite significant. That is probably the reason 
why, in terminology science, the terms “neoterm”, “terminological neologism” (ISO 
1087:2019) and “neonymy” (Rondeau 1984) have been created to differentiate the 
conceptual level from the linguistic one. In lexicology and lexical morphology, neo- 
logisms are mainly studied as part of word formation and semantics, the latter fur- 
ther exploring topics related to semantic shifting or semantic extension. 

Relevant linguistic phenomena include, among others, the formation of terms, 
the study of collocations and phraseologies, lexical and semantic relations, formal 
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and semantic neology, as well as variation. In several of these phenomena, the 
identified linguistic change often has an impact on the dictionary's macro- and mi- 
crostructure, as well as on the lexical units to be selected to feed the lexicographic 
resource. These lexicographic activities require that corpora are consistently main- 
tained and up-to-date to detect neologisms and neoterms in time to meet the users” 
demands. 


2.2 Determinologisation processes 


Determinologisation (Guilbert 1975, Galisson 1978, Meyer and Mackintosh 2000) is 
the process by which a term is transformed into a general language word or expres- 
sion. In these cases, the term does not refer to a concept anymore and, therefore, it 
is no longer part of a concept system within a given domain. Hence, there is a se- 
mantic or conceptual shift prompted by the elimination of one or more essential 
characteristics of the concept, thereby leading the term to lose its identity and 
specificity. 

Nová (2018: 387) goes further and considers that determinologisation corre- 
sponds to the process by which “a scientific term, during its way from a field spe- 
cialist to a layperson, loses its accuracy, gets new connotations, and the word can 
be even moved to refer to a completely different thing”. 

Semantic shift and term variation are the main axes for the study of the special- 
ised lexicon appearing in scientific, technological and technical texts and dis- 
courses, both written and oral. Therefore, linguistic change in form and meaning is 
a dynamic phenomenon that cuts across the entire lexicon. The time-lapse during 
which meaning is formed, from point A to point B, is recorded in dictionaries, ency- 
clopaedias, vocabularies and ontologies through the choice of lemmas (lexical unit 
or term) and the definition of the concept and/or the explanation of its meaning. 

In terminology, the definition stabilises the relationship between the lexical 
unit (form) and the specialised concept from a domain of knowledge, in a given pe- 
riod and cultural, political or social context. Meaning is thus fixed in time, which 
could be short in areas such as science and technology. This finding, resulting from 
long years of studying the lexicon, allows us to introduce the concept of “short di- 
achrony”, which can be observed in real time. Short diachrony occurs when one ob- 
serves linguistic change at the level of lexical units, mostly specialised, because of 
immediate changes in knowledge structures, e.g., when a new concept is intro- 
duced in a specialised domain, which has a direct impact on the lexicon. This lin- 
guistic change is identified, analysed and classified but it must also be registered, 
described or defined, and dated in dictionaries to preserve linguistic heritage. 

The ‘short diachrony’ observed in scientific, technical and technological texts 
contrasts with the ‘long diachrony’ typically studied in historical linguistics. Short 
diachrony is of great importance for the construction and update of corpora. In 
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science and technology, corpora age very quickly when it comes to the study of the 
lexicon, requiring a constant renewal of the texts that constitute them and constant 
observance of the published literature in the domains under study. Naturally, there 
are specialised domains in which more change is observed than in others, with 
varying rhythms and dimensions. 

This is why it is relevant that the PressCoronaCorpus corresponds to a monitor 
corpus, because as stated by Sinclair (1996): “It became clear some years ago that 
the assumption of a finite limit on a corpus for any length of time was an unneces- 
sary restriction."? As depicted in the following sections, it is possible to observe, in 
a relatively short time span, the appearance and disappearance of lexical units, as 
well as variation phenomena. 


3 PressCoronaCorpus: the corpus of analysis 


This work has been carried out via the analysis of a dedicated monolingual [EU Por- 
tuguese] corpus comprising both a journalistic and a Language for Special Purposes 
(LSP) subcorpora. As referred to in the introduction, the journalistic corpus — Press- 
CoronaCorpus — was compiled using Sketch Engine, namely the WebBootCaT tech- 
nology, along with a manual identification of texts publicly available on the internet, 
to capture newspapers and magazines related to COVID-19 topics. On the other hand, 
as mentioned above, we also retained official documents produced by healthcare 
agencies, healthcare professionals and scientists, making up the LSPCoronaCorpus. 
As such, and for the purposes of our study, the journalistic corpus is the corpus of 
analysis, whereas the LSP-based is the reference corpus. The latter is used to verify 
semantic shifts, differences in neologism formation, as well as differences in usage, if 
needed. 

The spectrum of PressCoronaCorpus is 9 months wide. It is a dynamic corpus 
intended to represent a snapshot of the language between November 2020 and 
July 2021 — a time window within the pandemic context — to observe how covid- 
driven new forms, and corresponding spelling variants, productively entered the 
Portuguese lexicon. 

The texts were gathered during this period on a weekly and sometimes bi- 
weekly basis, resulting in a large collection of different text types. This collection 
was organised according to the activities and targeted audience of the texts, culmi- 
nating in a text typology as systematised in Table 1: 


5 http://www.ilc.cnr.it/EAGLES96/corpustyp/node19.html #SECTION00090000000000000000 (last 
access: 10 June 2022). 
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Table 1: Text typology according to social activities and targeted 


audience. 

Text type Newspaper Magazine 
Generic 503 

Business 8 Economics 150 36 
Sports 36 2 
Environment & Garden 3 
Fashion 8 Socialite 14 
Health & Lifestyle 7 
IT 8 Electronics 8 


Travel & Culture 24 


The number of texts is significantly larger for newspapers when compared with 
magazines. On the other hand, the most salient type is Generic, whose percentage is 
7396 when compared with Business & Economics and Sports, whereas the collection 
of magazines is mostly related to Business & Economics, with 38%. 

The capture of media texts on the Internet is not a straightforward task. Due to 
their increasing subscription-based model, some media pages are not fully available; 
consequently, the texts were not collected in a balanced quantity throughout the pe- 
riod we have set. To overcome this drawback, we decided to (i) store the collected 
texts in .txt format, (ii) organise them by trimesters, and (iii) attribute a descriptor to 
each of them (e.g. PT-NP-GE-2020-11 - which stands for a Portuguese generic news- 
paper published in November 2020). Such a decision is tied with the manual task of 
corpus metadata annotation, a process that took place during the compilation of the 
collected texts with Sketch Engine. 

With regard to part-of-speech annotation, we resorted to a tagger embedded in 
Sketch Engine, specifically the Portuguese FreeLing tagset, since the texts that 
build up PressCoronaCorpus are in EU Portuguese. In short, by merging those two 
types of corpus annotations, we developed an annotated corpus enriched with 
metadata, text type and corresponding topic. 

Regarding the overall metrics of the corpus of analysis, despite the number of 
texts not being quantitatively balanced throughout the trimesters, the metrics are 
considerably robust for the diachronic spectrum under study. As seen in Table 2, 
the corpus has a little over 40 million tokens and more than 30 million words.° 


6 *A word is a type of token. Words are tokens which begin with a letter of the alphabet" 
(https://www.sketchengine.eu/guide/glossary/?letter=W) (last access: 10 June 2022). 
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Table 2: The metrics of the corpus (tokens, words and number of texts). 
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PressCoronaCorpus 
15nov2020- 16jan- 16mar- 16mai- 
15jan2021 15mar2021 15mai2021 11jul 2021 
Tokens 41 610 865 11 163 278 17 052 708 9557 012 3 837 867 
Words 32 930 803 8 849 548 13 438 800 7 640 911 3 001 544 
Texts 986 278 405 214 89 


4 Exploring the corpus of analysis: 


the example of COVID-19 


For the corpus exploration, we resorted to both simple and advanced queries, depend- 
ing on the results we were aiming at. Whereas the former use special characters, such 
as /*/ (e.g. covid* — from which we obtained matches like covid; covid-19; covid19; and 
so forth), the latter resorted to Corpus Query Language (CQL), with regular expressions 
(REGEX) at its core. For instance, to capture covid as a monolexical unit without punc- 
tuation and digits, we resorted to the following CQL: [word = “covid”][!(word = “-.*|. 
*19”)]. The same text mining strategy was used for Covid and COVID, but with differ- 


ent regexes for the first element [word =], given the different letter cases. 


The focus of this paper is the term COVID-19, which will be further explored in 


section 5. 


Table 3: COVID-19 spelling variants and 
frequencies for Portuguese language. 


covid-19 10611 
Covid-19 5455 
covid 3224 
COVID-19 1521 
Covid 1274 
COVID 399 
covid 19 37 
Covid 19 24 


COVID19 11 
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Table 3 (continued) 


covid-19 10611 
covid19 8 


Covid19 6 


Table 3 depicts several spelling variants found in the corpus, with covid-19 as 
the most frequent. Interestingly, despite being the form officially used in texts writ- 
ten by experts (much like OMS for EU Portuguese and WHO for English), COVID-19 
has a minor representation (6,7%) in the corpus when compared to covid-19 (47%), 
as represented in Graph 1. 


COVID-19 spelling variants 
COVID 


covid-19 
47,0% 


Covid-19 
24,2% 


Graph 1: Covid-19 vs COVID-19. 


The remaining spelling variants, in turn, not only have a low number of occur- 
rences, but also a short period of evidence throughout time. This can be observed 
in the diachronic spectrum of the corpus, as represented in Graph 2. 

Graph 2 depicts the diachronic distribution of the COVID-19 spelling variants 
between November 2020 and July 2021. The spelling variants are systematised by 
trimester and according to per million frequencies, like the example for covid-19 in 
Table 4. 

Thus, and as observed in the corpus, some spelling variants of COVID-19 seem 
to have a shorter lifespan than others. This is the case of variants that have no evi- 
dence in the last trimester (16MAY21-11JUL21). These short-life variants, namely 
covid19, Covid 19, Covid19, and COVID19 may reflect the instability of these new lex- 
ical units in discourse, given its recent lexicalisation. 
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Distribution of "COVID-19" spelling variants between NOV2020 e JUL2021 


L4 16NOV20-15)AN21 [2 16JAN21-15MAR21 [Mi 16MAR2021-15MAY21 [i 16MAY2021 - 11JUL21 


100?6 
75% 
50% 
25% 

0% 


covid19 Covid19 Covid19 COVID19 covid19  COVID Covid COVID-19 Covid-19 covid covid-19 


Frequency per million tokens (PressCoronaCorpus) 


Graph 2: Diachronic distribution of COVID-19 spelling variants (Nov. 2020 - Jul. 2021). 


Table 4: Systematisation of the spelling variant covid-19 according to the per million frequencies 
by trimester (Nov. 2020 - Jul. 2021). 


{covid}* 16NOV20- 16JAN21- 16MAR2021- 16MAY202 
15JAN21 15MAR21 15MAY21 - 11JUL21 


covid-19 296.24 per million 296.24 per million 207.07 per million 192.03 per million 
tokens * 0.03% tokens * 0.03% tokens * 0.021% tokens * 0.019% 


To further reflect this short diachrony topic, Graph 3 illustrates the time span of 
13 forms starting with covid- that we have identified in the corpus. 


O covid-20: Gcovid-free: 1 @covidario: 8  covideiros: 4 O covidência:2 © covidiana 71 O covid-24: G covid-25: @covid-30: @covid-Politica: O covid-positiva: @covidade ; se 


D 9 dezembro '12020' 20 janeiro 12021" 21 fevereiro 12021'21 = margo'12021'21 15 abril'12021'21 maio'202121 — 17 junho'12021'21 14 


Graph 3: Some examples of lexicalisations with short diachrony/short time span. 
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Focusing on January 2021 (highlighted with bullets in Graph 3), we can observe 
that the lexical unit covidário has its first occurrence in the corpus in December 2020, 
and quickly reached its highest frequency in January, with [8] occurrences. This num- 
ber of occurrences remains until March, finally dropping to [1] occurrence in April 
and remaining as such until July 2021. On the other hand, the lexical units covideiros 
and covidéncia, with [4] and [2] occurrences respectively, do not appear further be- 
yond February - an evidence over which we hypothesise a short time span, given the 
diachronic spectrum of the corpus. The lexical units covidade [3], covid-25 [2] and 
covid-positiva [1] are 3 examples with a short time span, which begins in February and 
ends in April. Finally, the lexical unit covidiana [1] is the form denoting the longest 
time span among the forms under focus here, i.e., it occurs between November 2020 
and May 2021. It should be noted that November is not the first attestation of the form, 
but corresponds to the onset of the corpus compilation. 


5 COVID-19: general considerations 
on word formation 


The first word to be analysed is COVID-19. COVID-19 is an acronym formed by the 
initialisms of the constituents of the polylexical term coronavirus disease 2019. The 
acronym results from a truncation process: co- and vi-, truncated elements, plus the 
initial d from disease. The number 19 points to 2019, the year when the World 
Health Organization (WHO) first learned about this new virus on 31 December 2019, 
following a report of a cluster of cases of an unknown pneumonia disease in 
Wuhan, People's Republic of China.” The WHO's proposal complies with the best 
practices recommended in 2015 by this organisation for the designation of new 
human infectious diseases.? The English acronym has been imported into Portu- 
guese, despite its complete lack of connection to the corresponding Portuguese 
polylexical term, which is doenca do coronavírus 2019. COVID-19 is, therefore, a hy- 
brid acronym formed by the initials of the constituents of the English polylexical 
unit and the reduction of the 2019 number, which becomes stabilised in speech and 
language through a lexicalisation process. The speaker assimilates the acronym as 
a monolexical unit and integrates it into his/her lexicon functioning as a noun, as 
attested in our corpus: 


7 https://www.who.int/emergencies/diseases/novel-coronavirus-2019/question-and-answers-hub/ 
q-a-detail/coronavirus-disease-covid-19 (last access: 10 June 2022). 
8 https://www.who.int/publications/i/item/WHO-HSE-FOS-15.1 (last access: 10 June 2022). 
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(1) a COVID-19 veio complicar ainda mais a adequada gestáo das doencas raras 
[COVID-19 (f.n.) further complicated the adequate management of rare diseases] 


(2) A presenca física no escritório nunca foi obrigatória para a generalidade dos nos- 
sos colaboradores ao longo dos últimos oito meses de coexisténcia com a COVID- 
19 [Physical presence at the office was never mandatory for most of our employ- 
ees over the last eight months of coexisting with COVID-19 (f.n.)] 


(3) Será possível exigir testes de diagnóstico para a COVID-19 nos estabelecimentos 
de saúde [The requirement of diagnostic tests for COVID-19 (f.n.) will become 
possible in health facilities] 


Given that COVID-19 refers to a disease (doença, in Portuguese), the term’s gender 
in Portuguese is feminine and behaves mostly like a noun in discourse. Neverthe- 
less, the use of the term in the masculine gender also occurs, as documented in the 
DPLP (cf. Figure 1) (“É também usado como substantivo masculino.” [It is also used 
as a masculine noun.] (DPLP, 2021), since the speakers, by metonymy, designate 
the disease after the virus causing it, which also happened with other diseases 
(e.g., the Zika virus and the Zika disease): 


COVID-19 


(acrónimo do inglês coronavirus disease 2019, doença de coronavírus 2019 [ano em que a 
doença foi identificada pela primeira vez]) 
nome feminino 
[Medicina] Doença infecciosa respiratória, causada pelo coronavirus SARS-CoV-2, cujos 
sintomas podem incluir febre, tosse, dificuldades respiratórias e cansaço, e que, em alguns 
casos, pode progredir para pneumonia ou falha respiratória. 
Nota: Também se escreve com minúsculas (covid-19). É também usado como substantivo 
masculino. 


Figure 1: Lexicographic article COVID-19 in DPLP. 


Our research has identified several new lexical units integrating the initial acronym, 
whereby covid- takes on the role of a formative, thus undergoing a re-categorisation. 
According to Bauer (2003: 330) formative is “a recurrent element of form which corre- 
lates with derivational behaviour in some way and yet cannot be identified with a 
morph”. 


coronavirus 
disease 
2019 


Figure 2: The evolution of the disease designation until it appears as a formative. 
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As represented in Figure 2, COVID-19 is the acronym of the polylexical unit 
‘coronavirus disease 2019’ which in turn, through an ellipse process, loses the hy- 
phen and the reference to the year. When the acronym COVID is written in lower- 
case (covid), the lexical unit is perceived as a noun. The term covid-19, a feminine 
noun, is the most frequent in the corpus, with covid being the most productive 
form from a morphological standpoint. Hence, COVID is a highly productive lexi- 
cal unit, behaving as a base form upon which a set of morphological processes 
act, giving rise to new lexical units while undergoing re-categorisation processes. 

The lexical categories of the term covid found in PressCoronaCorpus are noun 
(1) and adjective (2): 


(1) O futuro, se a covid nos permitir, é sempre risonho, nós temos de olhar para a 
frente sempre com perspetivas de construção [The future, if covid allows us, is 
always bright, we must always look forward with constructive perspectives]. 


(2) Ontem, numa corrida só com quatro atletas — por precauções ‘covidianas’, 
quando nos 5000 marcha, durante muitos minutos, andaram todos juntos [Yes- 
terday, in a race with only four athletes — for ‘covidian’ precautions, when in 
the 5000 march, for many minutes, they all walked together]. 


Regarding word formation, we have observed the typical word processes: derivation 
and composition. In the case of derivation, words are formed through the addition 
of a prefix or a suffix to the base, which can be a stem, a theme (stem + thematic 
vowel) or a lexical unit. In this paper, we focus on the acronym COVID, which is the 
base form of the lexical units we are analysing in section 6. The acronym is a lexical 
unit formed by the word formation process in which an initialism is read syllabi- 
cally and is a morphological constituent on which word formation operations take 
place, thereby allowing the creation of formal and semantic derived monolexical 
neologisms. 

Composition, in turn, is a word formation process that operates by concatenat- 
ing two or more word stems or two or more words. In Portuguese, there are two 
types of composition: morphological composition, which concatenates stems ac- 
cording to the principles of morphological word formation, and morphosyntactic 
composition, according to which properties of syntactic structures and properties of 
morphological structures are combined. The examples we selected illustrate cases 
of morphosyntactic compounds: ala covid and doente covid. These compounds are 
formed by an adjunct structure, that is, they are constituted by two nouns, with 
similar behaviour to nominal syntactic structures. The right constituent (covid in 
both examples) functions as a nominal modifier, generating a new lexical unit. 

Considering the dynamics of the acronym covid and since it is highly productive 
from a morphosyntactic point of view, we use the term to describe formation of lexi- 
cal units in which covid is an endocentric formative. 
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6 Analysis of morphosyntactic and semantic 
formation of the neologisms and neoterms 


In this section, we observe the occurrences of the base form covid to identify behav- 
iours and regularities in word formation, as well as its associations with other elements 
(prefixes, suffixes), to verify lexical productivity and determine the semantic compo- 
nent of the elements. Based on the data analysis, we will justify whether these words 
can be considered neoterms or, on the contrary, if having a term in their formation cor- 
responds to a “false” intuition. As stated by Lombard/Huyghe/Gygax (2021), the neolog- 
ical intuition is an essential feature of neologisms, which can vary according to the 
individuals and the regularity of lexical creativity processes. 

One of the most regular and productive word-formation processes in Portu- 
guese is derivation. Derivation is distinguished from composition in that, contrary 
to the latter, there is only one autonomous unit of lexical meaning — the base - to 
which an affix (prefix or suffix) is added to form a new lexical unit. As examples of 
derived words, we selected the noun covidário [covidarium] (whose occurrence in 
the corpus remained stable over four months) and the adjective covidiana [covidian] 
(one of the first occurrences identified in the corpus). In these examples, two suf- 
fixes -ário and -(i)ano or -(iJana (with a feminine inflexion mark) are added to the 
base covid-. These suffixes occurring after the base determine the lexical category 
of the newly formed nominal base derivatives (-ário, which forms nouns, and -(i) 
ano, which is highly productive in the formation of adjectives), and are also carriers 
of semantic values. Covidário denotes a place (i.e. it has a locative value, being a 
locative denominal noun according to Rio-Torto et al. (2013) where certain entities 
remain or are housed, such as in aviário (aviary), bercário (nursery), fraldário (dia- 
per changing room), infantário (nursery school) and solário (solarium). In turn, the 
denominal adjective covidiano is formed by the adjectival suffix -(i)ano, which de- 
notes a living being, as in bacteriano (bacterial). 

Again, within the processes of derivation, anticovid is an example of a word 
formed by prefixation. The prefix anti- combines with the nominal base, covid, 
changing the base word's lexical category (covid N > anticovid ADJ). Due to its se- 
mantic, oppositional value, the prefix anti- is combined with bases denoting entities 
(a disease, in this case), without number inflexion. 

As an example of morphosyntactic composition, we chose covid-drive, a hybrid 
compound that occurs twice in our corpus. In terms of its constitution, the noun 
covid is combined with another English-imported noun making up an [N + N] struc- 
ture. Concerning morphosyntactic compounds, we highlight occurrences such as 
ala covid (covid ward) or doente covid (covid patient), which can be included within 
the so-called “modifying compounds” (Rio-Torto et al. 2013: 93), in that the second 
element modifies the noun. We are in the presence of an [N + N] structure: 
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(1) [lala] N1 + [covid] N2] polylexical unit (compound) 


(2) [[doente] N1 + [covid] N2] polylexical unit (compound) 


In both examples (1) and (2), the noun covid qualifies the N1, allowing us to infer that 
the ala covid denotes a ‘place where patients with the disease are housed/hospital- 
ised’. In the second case, doente covid denotes a ‘person with covid disease”. In these 
situations, time provides us with the answer of whether these compounds are going to 
become lexicalised or not and, consequently, if the lexicographer should describe 
them in a dictionary, which in these cases has already happened. Both ala covid and 
doente covid are neologisms, not because the process of word formation is innovative, 
but because the lexical distribution of their elements has a novelty effect. 

From a semantic point of view, these neologisms are formed by lexical units 
belonging to the lexicon of general/current use, ala and doente, to which the term 
covid is associated. The question that arises is whether the compound resulting 
from the combination of these two units forms a neologism or a neoterm. Following 
our analysis, these units occur in specialised contexts, mostly in public health dis- 
course, but they are not exactly terms because they do not belong to any particular 
specialised domain. This implies that, in the context of a lexicographic work, these 
units would not be classified as belonging to medicine, biology, virology, epidemi- 
ology or public health through domain labels. In these two examples, the specificity 
of the < covid > concept is nullified because the characteristics /Respiratory System 
Disorder/, /Pneumonia/ and /Viral Pneumonia/ are cancelled, thus losing the se- 
mantic value associated with specialised domains related to the disease.? 

A different process is that of the lexical unit covidário, which is a derivative 
formed by analogy, for example, with bercário [nursery]. Covidário can be defined 
as an isolated space in a health facility dedicated to covid patients. In this case, the 
-ário suffix does not have a specialised sense and is therefore not a term, being clas- 
sified as a neologism, although it occurs in specialised contexts, especially those 
related to hospitals. However, curiously, the DPL, for example, considers covidário 
as a term of Medicine, but does not consider bercário as a term belonging to that 
same domain, despite referring to hospitals in the definition. Have lexicographers 
been misled by the semantic value of the base covid? We find that whenever the 
formative covid appears, there is a tendency to consider the new lexical unit as a 
term [ex: covidiota, covidivórcio] 

Lastly, the covidiota occurrence clearly results from a process of word formation 
in which the specialised sense of covid is lost. From our point of view, this is a process 
of determinologisation. Covidiota is divided into covid + idiota, being a morphological 
compound, which is used to refer derisively to a person who does not respect general 


9 http://covidterm.imicams.ac.cn/#/search?isAdvanced=false&keyword=covid (last access: 
10 June 2022). 
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safety measures, either voluntarily or involuntarily. This unit is a neologism of form 
and meaning. Formally, it is a portmanteau that corresponds to the blending of two 
lexical units in which one of the units is truncated: id being the end of the covid unit 
and the beginning of the idiota unit. We are faced with a haplology, which corre- 
sponds to the elimination of one of two consecutive syllables when they are identical 
or very similar (see Marquilhas 2014: 28). This neologism has no specialised value. 

As we can see, it is not always self-evident whether a lexical unit is specialised 
or not. The fact that the covid acronym and its variants are morphologically produc- 
tive and dynamic (see Table 5) requires an accurate analysis of each of the cases. 


Table 5: The productivity of the covid acronym. 


LEXICAL UNIT POS WORD FORMATION WORD FORMATION 

covid N Abbreviation of a polylexical unit ACRONYM 

covidário N [covid]base + [ário] suf. DERIVATION 

covidiana AD) [covid]base +[(i)ana] suf. DERIVATION 

ala covid N [ala]N + [covid]N COMPOSITION 

covidade N [covid]base + [idade] suf. DERIVATION [HAPLOLOGY] 


covidiota N [covid]base + [idiota] N COMPOSITION [PORTMANTEAU] 


The determinologisation process is evident in the formation of new general lan- 
guage words, as shown by the examples in Table 6. Reusing the covid acronym as a 
formative element contributes to the process of determinologisation, since the core 
meaning is used in a superficial manner. 


Table 6: Determinologisation process. 


covidiota [+ neologism] [- neoterm] [composition | portmanteau] 


covideiro [+ neologism] [- neoterm] [derivation | suffixation] 
covidices [+ neologism] [-neoterm] [derivation | suffixation] 
covidivórcios [+ neologism] [-neoterm] [composition | portmanteau] 


These lexical units are used in a covid context, but they neither convey specialised 
features, nor do they belong to a conceptual system of a domain. 
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7 Lexicographic treatment of neologisms 
associated with COVID-19 


After the extraction and analysis of neologisms from PressCoronaCorpus, we now 
move on to the lexicographic treatment of four neologisms — covid; covidário; cov- 
idiana; anticovid — previously selected and registered in Portuguese language 
e-dictionaries, namely DPLP and DLP. 

These two lexicographic resources were selected because (i) they are available 
online, (ii) they are constantly updated with neologisms, both from general lan- 
guage and specialised language, and (iii) they have a very broad list of headwords, 
each having more than 100,000 entries. 

DPLP is a contemporary Portuguese dictionary with about 133,000 lexical entries, 
whose headword list comprises general language vocabulary as well as terms from 
various specialised domains. This resource also offers the possibility of browsing en- 
tries in European Portuguese spelling, following the 1990 Portuguese Language Or- 
thographic Agreement, and in Brazilian Portuguese, with and without the changes 
prescribed by this agreement. 

DLP is a monolingual Portuguese dictionary that is integrated into the infope- 
dia.pt service,” which provides 30 bilingual online dictionaries in several lan- 
guages (Portuguese, Portuguese Sign Language, English, Spanish, French, German, 
Italian, Dutch, Chinese, Tetum and Greek). Following European Portuguese spell- 
ing, it has two versions: one according to the 1990 Orthographic Agreement of the 
Portuguese Language, and the other according to the previous standard, that is, the 
Portuguese-Brazilian Orthographic Agreement of 1945. 

The pandemic, as the term itself implies (pan-, Greek pan, all), caused the rapid 
and simultaneous entry of new words — the aforementioned neologisms — in languages 
around the world. The urgency to publish neologisms of high daily frequency in real 
time and the need to satisfy the searches of dictionary users often lead to some rash 
decisions, not allowing lexicographers appropriate and timely reflection on the phe- 
nomena and consequent validation of data. Aware of this problem and of certain limi- 
tations, we proceed with the comparison of the lexicographic treatment of neologisms 
associated with the pandemic crisis, intending to answer the following questions: 

1) Do DPLP and DLP register the neologisms detected in PressCoronaCorpus and 
which are currently under analysis? 

2) Do DPLP and DLP account for these units as neologisms? 

3) Do DPLP and DLP consider covid as a formative? 


The first step was to check whether these units occur in DPLP and DLP. We found 
that the above-mentioned units are attested in both dictionaries. 


10 https: //www.infopedia.pt/ (last access: 10 June 2022). 
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As we can see in Figure 3, the neologisms covidário and covidiano show mor- 
phological structures specific to Portuguese, that is, they are considered words de- 


rived by suffixation (suffixes -ário; -(i)ano). 


DPLP 


co:vi-da:ri-o 
(COVID[-19] + -ário) 
nome masculino 
Local, devidamente isolado e equipado que, num estabelecimento de saúde, se 
destina ao atendimento e ao tratamento de doentes com suspeita ou confirmacáo de 
infecção por COVID-19 (ex.: o hospital criou um covidário para reduzir os riscos de 
contágio). 


DLP 


covidário 
co.vi.dá.ri.o kovi'darju 
nome masculino 


MEDICINA área, em hospital ou outra instituição de saúde, 
especialmente preparada para o atendimento e tratamento de 


doentes com COVID-19, bem como de indivíduos suspeitos de 
estarem infetados pelo vírus que origina essa doença 


@ De COVID[-19]+-ário 


covidiano 


co-vi-di-a-no 

(COVID[-19] + -iano) 
adjectivo 
Relativo a uma doença infecciosa respiratória causada pelo coronavirus SARS-CoV-2; 
relativo à COVID-19 (ex.: confinamento covidiano). = COVIDICO 


covidiano © kovi'djenu 


adjetivo 


relativo à COVID-19 (doença respiratória viral causada pelo 
coronavirus SARS-CoV-2) 


@ De COVID[-19]+-iano 


Figure 3: Lexicographic articles covidário and covidiano (DPLP, DLP). 


The first lexicographic data analysed pertains to the formation of words. This infor- 
mation is shown in DPLP in italics between brackets, below the syllabic division of 
the words. DLP, on the other hand, makes use of an icon to show information about 
word formation, thereby requiring the user to hover the mouse cursor over the icon 
to access this information. The two lexicographic resources coincide in the analysis: 
COVID-[19] + -ário. The first point that catches our attention is the fact that this infor- 
mation does not indicate the possibility of covid being treated as a formative (covid-) 
for new words, indicating instead that these words — covidário and covidiano — are 
formed from the original acronym rather than the covid noun itself. 

Another topic, often controversial in lexicography, lies in the use of the Medic- 
ina [Medicine] domain label in the covidário article in both dictionaries — especially 
when confronted with the word covidiano, which does not have any domain label. 
We may question whether this word belongs, in fact, to the medical domain or 
whether it constitutes a process of determinologisation. Even so, this topic goes be- 
yond our scope, since domain labels in general language dictionaries, in many 
cases, only function as mere identifiers or word sense disambiguators. Moreover, 
the dictionaries do not inform us about the criteria for using this label, so we could 
only make assumptions regarding their use. 

Moving on to the illustrative example of derivation by prefixation (starting with 
the prefix anti-), we have the anticovid entry. Although our corpus shows the 
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hyphenated spelling anti-covid, according to the Portuguese orthography, the word 
must be written in an agglutinated form, as evidenced by the entries in the dictio- 
naries. However, while DPLP registers anticovid, DLP registers anticovid-19, where 
once again the word formation points to the COVID-19 acronym (see Figure 4). 


DPLP DLP 
an-ti-covid anticovid-19 
(anti- + COVID) 

adjectivo de dois géneros e de dois números &tikovid daze'nov(a) 

Que se destina a combater ou proteger contra a COVID-19 (ex.: máscara anticovid; vacinas 

anticovid) adjetivo invariável 


que visa combater ou prevenir a COVID-19 (doenca respiratória 
viral causada pelo coronavírus SARS-CoV-2) 


@ De anti+COVID-19 


Figure 4: Lexicographic article anticovid (DPLP, DLP). 


We will now turn to the analysis of the units denoting the ‘severe acute respiratory 
syndrome’ COVID-19 (see Figure 5). The orthographic forms are treated differently 
in each dictionary: the lemma registered in DPLP is the acronym in uppercase 
(COVID-19), while DLP chose as lemma the acronym together with its lowercase 
form, which can be considered as a spelling variant. The COVID entry is also pres- 
ent in DPLP, which further notes the lowercase form: “Também se escreve com mi- 
núsculas (covid)” [Also written in lowercase (covid)]. 


DPLP DLP 


COVID 


(reducáo de COVID-19) 
nome feminino 
[ mal] [Medicina] Doença infecciosa respiratória, causada pelo coronavírus SARS-CoV-2, 
cujos sintomas podem incluir febre, tosse, dificuldades respiratórias e cansaço, e que, em nome feminino 
alguns casos, pode progredir para pneumonia ou falha respiratória. = COVID-19 

Nota: Também se escreve com minúsculas (covid). É também usado como substantivo 


COVID-19, covid-19 


kovid deze'nov() 


MEDICINA doenca respiratória causada por um coronavírus (SARS- 
CoV-2), apresenta sintomatologia variável, desde casos 


masculino. 
assintomáticos ou formas de intensidade ligeira (cujos sintomas 
podem incluir febre, tosse, fadiga ou dores musculares) até 
COVID-19 situacóes graves (sobretudo em idosos ou pessoas com 
(acrónimo do inglés coronavirus disease 2019, doença de coronavirus 2019 [ano em que a problemas de saúde preexistentes), que podem evoluir para 
doença foi identificada pela primeira vez]) cenários de pneumonia, falência de múltiplos órgãos e eventual 
nome feminino morte; (inicialmente identificada na China, em 2019, atingiu estatuto de 
Doença infecciosa respiratória, causada pelo coronavirus SARS-CoV-2, cujos pandemia em 2020) 
sintomas podem incluir febre, tosse, dificuldades respiratórias e cansaço, e que, em alguns 
casos, pode progredir para pneumonia ou falha respiratória. Do inglés Corona virus disease 2019, «doença de coronavirus 
Nota: Também se escreve com minúsculas (covid-19). E também usado como substantivo @ 2019», (ano em que foi identificado o primeiro surto da doenca) 
masculino. 


Figure 5: Lexicographic articles regarding COVID-19 (DPLP, DLP). 


Looking at Figure 5, we see that the unit is classified as a feminine noun in both 
dictionaries, although DPLP also notes the possibility of it being used as a mascu- 
line noun: “É também usado como substantivo masculino.” [“It is also used as a 
masculine noun.”]. Although the feminine gender is recommended, our analysis of 
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the corpus actually attests to the fluctuation in grammatical gender. Instead of re- 
sorting to the notes field, which often includes very general information, a better 
structuring of the data should make the information about gender fluctuation ap- 
pear in the grammatical information field itself. That is, where nome feminino [femi- 
nine noun] appears, lexicographers should add nome masculino [masculine noun], 
if the intention is to attest to actual usage in corpora, or simply move the note closer 
to the gender field, since it is purely grammatical information. It should be noted 
that in the DPLP case, we have two consecutive notes of different nature: one, fo- 
cusing on the spelling, indicating that the form is also written in lowercase, and 
another with a grammatical scope, referring to the word’s gender. 

We now turn to the matter of the domain labels. While DLP classifies COVID-19 
as belonging to medicine, being preceded by the definition, DPLP shows two usage 
labels: a diaphasic label, Informal, and a diatechnical label, Medicina. As lexicogra- 
phers and terminologists, we may assume the editors’ intentions by including these 
two labels of a different nature, even though they may seem contradictory. However, 
we question whether an ordinary dictionary user is able to understand these labels. 
In our opinion, the Medicina label specifies the specialised domain, in which COVID 
is a term since it denotes a disease. On the other hand, the Informal diaphasic label 
in the covid entry may be justified by its distancing, both semantically and formally, 
from the original concept of < COVID-19 >, moving from its original context in the 
medical domain to a less specialised and more general context. However, we may 
question the use of this diaphasic label, since the fact that a given term becomes pop- 
ularised does not necessarily mean that it starts being used in informal contexts. In 
the scale usually established between the informal and formal registers, the use of 
the reduced form covid can be situated in a neutral language register, or can even be 
used in specialised contexts. In any case, we do not see any advantage in the combi- 
nation of these two labels, which may even confuse the user or raise further doubts. 

Lastly, regarding term formation, DPLP highlights the fact that it is an English 
acronym. 

Concluding our analysis, we are now in a position to answer the above-mentioned 
research questions: 

1) DPLP and DLP record neologisms detected in PressCoronaCorpus and selected 
for analysis. It is important, however, to mention that with regard to the suffixal 
derivation process, at first, we had chosen as possible candidates for analysis 
covidéncia, covidade and covidizacáo, due to their strong neological character. 
However, at the time of writing this paper, none of these lexical units was re- 
corded in the dictionaries, so we had to exclude them. 

2) None of the dictionaries have any identifying marks for neologisms or new 
entries. 

3) None of the dictionaries shows covid as an element for forming new words, re- 
ferring instead to the original acronym (COVID-19). 
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8 Template proposal for a lexicographic article 


Following a thorough analysis of the extracted terms selected from our corpus along with 

the lexicographic treatment observed in the two dictionaries under study, we propose a 

template for the term covid. This proposal should bear in mind the following points: 

1) the term covid should display its original form, both as an acronym and as a 
noun in lower case; 

2) collocations should be included and highlighted; 

3) the term covid should be analysed as a formative element of new words. 


In Figure 6, the entry referring to the disease is presented. In the “entry” field, the 
two forms identified in the corpus are registered. The lowercase form has more oc- 
currences (47% as shown in Graph 1), followed by the acronym form. The “part of 
speech’ field accounts for the category (n. - noun) and the gender (f. = feminine). 
Gender fluctuation is not considered here, given that the use of the masculine form 
is to be avoided and its inclusion may confuse the end-user. This information and 
other related questions about orthographic rules, for example, could be given by 
links pointing to other lexical resources, such as spelling manuals or orthographic 
vocabularies, which can help clarify user questions. 

The selected domain label points towards the Medical and Health Sciences do- 
main (Costa et al. 2020). Since the lexicographic definition, starting with “infectious 
disease', seems to provide sufficient clarification, the information pertaining to the 
domain label may be hidden from the user but still is useful to retrieve information 
for lexicographic purposes. In any case, its insertion is justified, as it allows the lex- 
icographer to better control the terminology and future semantic associations made 
between this term and other related terms. After the lexicographic definition, which 
should be as objective and simple as possible, there are usage examples extracted 
from our corpus. In addition to these usage examples, the observed collocations are 
registered and also illustrated via real usage contexts. 

Should there be any general observations, exemplified in Figure 6, these can be 
supplied under 'note'. 


covid-19, COVID-19 n.f. 


Domain label Ciéncias Médicas 
doenca infeciosa causada por um coronavírus (SARS-CoV-2) 


Example: Devido ao surto de covid-19 vindo da variante sulafricana, a Bélgica, país onde a maioria das corridas irá para a frente, pós em aberto o cancelamento 


Collocation vencer a covid-19; contrair COVID-19; tratar a COVID-19 
Example: à medida que comecamos a ver o verdadeiro impacto dos tratamentos adiados em 2020 e os efeitos de longo prazo sobre aqueles que contraíram 
covid-19". 


Note 1. Acrónimo inglés 'coronavirus disease 2019' , 'doença do coronavirus 2019'. 


Figure 6: Lexicographic article regarding covid-19 and COVID-19. 
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Lastly, since covid is used as an element which forms new words, its entry as a 
formative is also presented in Figure 7. Following the entry, which ends in ‘- in this 
case, clearly showing that this is a word formation element, there is respective 
grammatical information, as well as the domain label and related sense. The exam- 
ples are depicted after that. Again, if other types of information are needed, the 
‘note’ field can be used. 


covid- elemento de formação 


Domain label Ciências Médicas 
que exprime a noção de covid (doença) 


Example: covidário; covidente; covidiano 


Note 1. Da forma reduzida covid-19, pelo acrónimo inglês coronavirus disease 2019 , ‘doença do coronavirus 2019’. 


Figure 7: Lexicographic article regarding covid. 


With these two examples, we believe to have shown that a more rigorous and better 
segmented structuring process of lexicographic data can bring clarity to the entries 
and, in turn, to end users. The spelling variants (the lowercase noun and the upper- 
case acronym) are displayed as lemmas from the start, thereby preventing informa- 
tion of identical scope to be dispersed. The option to resort to the domain label 
does not necessarily mean that this unit is only used in specialised contexts. In- 
stead, it helps to frame the unit within a previously outlined domain taxonomy. 
Although corpus-based examples, and mainly collocations, play a key role in help- 
ing the user observe those units in real usage contexts, this seems not to be particularly 
valued by both DPLP and DLP. Notes, too, will always be helpful in providing other 
types of information which may be useful to the end user. Ultimately, the innovative 
contribution of our approach is to introduce an entry for covid as a formative element. 


9 Concluding notes 


In this paper, we analysed new lexical units which have arisen in European Portu- 
guese amidst this pandemic situation. These new lexical units — neoterms and neolo- 
gisms — emerged for two main reasons. Firstly, the experts needed to designate new 
concepts which appeared within a specialised context. Secondly, there is also the 
need to transfer information produced by experts to a non-expert audience. This 
knowledge transfer is carried out not only by experts, especially from public health 
settings, but also by journalists and other authors, via discourse production. When 
this knowledge is transferred to a non-expert audience, information is lost, given that 
the latter does not have the required knowledge to understand the specialised content 
of that message. On the other hand, journalists, also a non-expert group, interpret 
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specialised texts and try to reproduce the information, and therefore information is 
very likely to get lost, in what concerns both rigour and precision. 

While terminology use is present throughout these communication scenarios, 
neoterms sometimes lose their status and become simple neologisms, thereby lead- 
ing to determinologisation processes, since there is a context shift which entails a 
loss of their specialised nature. The extremely fast pace at which new units emerge, 
either neologisms or terms, has a strong impact on the lexicographer's work. The 
urgency of publishing neologisms of high daily frequency in real time, as well as 
the need to meet the research requirements of general language dictionary users, 
often lead to a certain hastiness, not giving lexicographers the opportunity to con- 
duct a thorough analysis and subsequent validation of their data. 

On the other hand, dictionaries can be “descriptive” or “prescriptive/normative”, 
establishing the model to follow. Prescriptivism is an approach that attempts to de- 
termine the rules of correct usage of a language, while descriptivism is an approach 
that analyses and describes how the speakers of a language actually use it. Con- 
cerning the dictionary as a language model, descriptive guidance has become more 
usual, a process facilitated by the fact that lexicographers can access increasing 
amounts of corpora to support their descriptions. We maintain this approach, even 
though we consider that descriptive dictionaries benefit from a certain normative 
tone (hence we do not consider the occurrences of covid as a masculine noun in our 
proposal). Users ultimately resort to dictionaries to clarify their doubts and to en- 
sure a correct usage of language. 

As stated by Nová (2018: 397), “there is probably no universal way to treat de- 
terminologized words, but many of them need a special approach”. Some fields can 
be used for this purpose, as is the case of the notes field, as we have shown. How- 
ever, it is necessary to take into account that we are dealing with general language 
dictionaries, i.e., the notes should never be too long. 

Aware of the difficulty of registering neologisms in general language dictionar- 
ies, especially in the context of an on-going pandemic in which hundreds of speci- 
alised units are subject to daily analysis, we found that Portuguese lexicographic 
resources would greatly benefit from presenting information extracted from analy- 
sis corpora and gradually accounting for observed phenomena, such as the deter- 
minologisation of terms. In this sense, our paper intends to be a contribution to the 
advancement of national lexicography. 
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Mireille Vale, Rachel McKee 

Neologisms in New Zealand Sign 
Language: A case study of COVID-19 
pandemic-related signs 


1 NZSL background and lexicography 


New Zealand Sign Language (NZSL)' is estimated to be used by 3,000-5,000 Deaf 
people in New Zealand, with a larger group of just over 20,000 New Zealanders 
able to “have a conversation about a lot of everyday things” in the language.” Prior 
to the development of interpreting services in the 1980s and acceptance of NZSL in 
education from the 1990s, NZSL was used mainly for communication in private, so- 
cial domains, which restricted the size and fields of lexicon. 

Linguistic documentation of NZSL began in the mid-1980s (Collins-Ahlgren 1989, 
Levitt 1986). Early lexicographic efforts culminated in the print Dictionary of New Zea- 
land Sign Language (Kennedy et al. 1997) followed by a Concise Dictionary of New Zea- 
land Sign Language (Kennedy et al. 2002) comprising the 2,000 most frequent signs. 
These print dictionaries were amongst the first corpus-based signed language dictio- 
naries that used data from signed language as the source of lexicon rather than being 
a translational glossary from the spoken language. Lexical documentation was based 
on the systematic analysis of video recorded, mainly spontaneous discourse around 
elicited / guided topics. An extensive community validation process was undertaken 
before signs (including variants) were entered in the dictionary. 

The existence of these dictionaries contributed to legal recognition of New Zea- 
land Sign Language in 2006 (McKee 2006). Official language status and disability 
access measures have subsequently made NZSL more visible in public domains. 
Deaf NZSL users increasingly participate in wider social, political, occupational and 
educational domains, leading to rapid lexical development of NZSL in these fields. 
This parallels lexical expansion seen in the national indigenous language, Te Reo 


1 It is conventional in linguistics literature to use the phrase ‘signed languages’ when referring to 
languages in this modality in a general or collective sense (cf. ‘spoken languages’ or ‘written lan- 
guages”). However, the proper names of specific national languages in English take the form “(New 
Zealand / American / British. . .) Sign Language’. 
2 http://www.stats.govt.nz/Census/2013-census/profile-and-summary-reports/quickstats-culture- 
identity/languages.aspx (last access: 10 June 2022). 
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Maori, as an outcome of recognition and revitalisation (Harlow 1993). The Deaf 
community’s participation in new domains is typically mediated by interpreters, 
who are challenged by lexical inequivalence between English and NZSL. 

Representing a visual-gestural language with static images (in the absence of a 
written form) is a key challenge in signed language lexicography (McKee/McKee 
2013). Improvements in digital media and data storage enabled the creation of the 
Online Dictionary of New Zealand Sign Language (ODNZSL) with video content. Tak- 
ing the dictionary online included revising and revalidating existing data and add- 
ing further entries and video material (with corpus-derived but edited example 
sentences). Further entries have been added in batches, with the most recent up- 
date in 2017. By signed language dictionary standards, the 6,000 or so entries in 
the ODNZSL make it a reasonably large and comprehensive dictionary; signed lan- 
guage lexicons are relatively small due to limited lexicalisation, the capacity of pro- 
ductive forms to express novel context-dependent meanings, and the fact that 
signed languages were historically used in limited domains (Johnston 2012). The 
dictionary is a general-purpose dictionary primarily aimed at L2 learners rather 
than at the Deaf community, and for this reason the initial focus was on document- 
ing high frequency signs. A user study found that use of dictionary content in teach- 
ing materials is a primary function for Deaf NZSL users and that it may also have an 
authoritative / standardising role, but is rarely used by Deaf NZSL users to look up 
the meaning or form of unknown signs (Vale 2015). Corpus work in NZSL has been 
undertaken in projects from the 1990s, but annotation of signed language corpora 
is complex and labour intensive, and the dictionary does not have access to a 
highly contemporary corpus from which to source current neologisms. 

To leverage the Deaf community’s increasing online presence, the web-based 
platform NZSL Share was launched in March 2020 to crowdsource new and previ- 
ously undocumented signs, and to encourage community validation of these signs. 
The platform allows users to upload sign videos, comment on videos and agree or 
disagree with (often new) signs being proposed. It is managed by the research team 
that maintains the ODNZSL, which includes the authors. NZSL Share is being used 
by individuals as well as Deaf community groups to record and share signs of a spe- 
cialist nature (e.g., school curriculum signs). NZSL Share now has close to 50 
actively contributing members. Its launch coincided with the 2020 COVID-19 out- 
break in New Zealand and so some of the first signs contributed were COVID-19- 
related, which are the focus of this paper. 


2 COVID-19 in New Zealand 


The first COVID-19 case in New Zealand was reported on 28” February 2020 and by 
the end of March, the entire country was required to comply with a full lockdown 
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(known locally as Alert Level 4) with the aim of eliminating COVID-19 from the com- 
munity. During this time, the government and public health officials broadcast 
daily updates through television, radio and print media. It was vital that these com- 
munications reached all communities rapidly and so NZSL translation of print infor- 
mation was commissioned through agencies associated with the Deaf community, 
and NZSL interpreters were deployed at official televised briefings (also posted on- 
line). Interpreters and translators were thus at the front line of communicating new 
information to the NZSL community, always working under time pressure, with few 
reference sources and, along with the rest of the population, encountering new in- 
formation and jargon as the pandemic unfolded daily. As such, interpreters and 
translators become de facto language innovators — generating translations and es- 
tablishing terms ahead of Deaf community usage. Translation-driven lexical inno- 
vation is common when a minority language is used to translate information in 
public domains, as with Irish for example (Ní Ghearáin 2011). The Deaf NZSL com- 
munity could not contribute greatly to creation of terminology at the outset of the 
COVID-19 pandemic because they were also grappling with the new information 
and concepts conveyed to them via translation. Furthermore the whole population 
was isolating at home which restricted discourse in NZSL at a community level 
about COVID-19, beyond online video interaction. While novel lexicon is the focus 
of this paper, terminology was just one of many significant challenges in mediating 
information to the NZSL community during the pandemic. 


3 Method 


We aimed to investigate translators’ and interpreters’ strategies for dealing with the 
demands of new terminology and lexical inequivalence, and their observations 
about the conventionalisation and dissemination of COVID-19-related signs that 
they used. We also wanted to explore how and when such neologisms could be en- 
tered in the ODNZSL. To gather data, we catalogued signs related to COVID-19 that 
were contributed to NZSL Share, and conducted two focus group interviews with: 
(1) interpreters who had interpreted briefings on TV (hearing L2 NZSL users, profes- 
sionally trained), and (2) translators who had produced NZSL versions of public 
health information bulletins (Deaf L1 NZSL users, bilinguals with experience but no 
formal training). 

Focus group interviews sought to elicit vocabulary that prompted innovation in 
translation, the strategies that participants used to convey new terms and concepts 
on the spot, and observations around the development and dissemination of coro- 
navirus-related signs among interpreters/translators and into the wider community. 
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4 Findings 
4.1 Novel terminology in COVID-19 related information 


From the signs contributed to NZSL Share and from interview data, we identified 
types of novel terms and phrases in English that created challenges for translators 
and interpreters, and which therefore could trigger neologisms in NZSL. These in- 
cluded not only COVID-19-related terms but also adjacent vocabulary relating to 
economic and social aspects of the pandemic response. We loosely categorise this 
vocabulary below in terms of reasons for lexical challenges in NZSL (see Table 1). 


Table 1: Categories of challenging terms and phrases in translation. 


A. Medical / testing related terms - English (technical) 


Antibody, community transmission, coronavirus, covid-19, covid-positive, covid-negative, dose, 
epidemic, epidemiological link, genome testing, herd immunity, nasopharyngeal swab, negative 
pressure room, pandemic, PPE, screening, strain, vaccine, vaccine rollout, virus 


B. Other new / extended / reframed concepts - (NZ) English 


alert levels 1-4, bubble, case, casual (+) contact, close contact, eliminate, eradicate, essential 
services, lockdown, mask, MIQ/ managed isolation, places of interest, quarantine, self-isolation, 
social distancing, team of five million, trans-Tasman bubble 


C. Lexical gaps in NZSL / difficult to translate concepts 


border, closing the border, hygiene, fiscal, mortgage holiday, notice (official Government notice), 
Reserve Bank, rent freeze, road block, support package, symptoms 


Firstly, many terms that frequently occured in the government information and media 
briefings were technical medical terms already in use in English (with the exception 
of COVID-19). Some of the terms in Category A might be reasonably common (e.g., epi- 
demic, vaccine, immunity) but others would previously have had limited use beyond 
the medical /scientific community (e.g., genome sequencing, negative pressure room). 

Category B consists of terms that were either neologisms in NZ English or that 
were used in an extended or specific sense in relation to COVID-19 (such as lock- 
down, alert levels, bubble, essential workers). 

Finally, Category C contains terms in the source language that were not new or 
not directly related to COVID-19, but were nevertheless challenging because no 
equivalent signs exist in NZSL (such as symptoms, border). Some of these lexical 
gaps arose in relation to jargon around economic and social policy responses (e.g., 
Reserve Bank, employment support package, mortgage freeze). 
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4.2 Strategies for lexical innovation and types 
of resulting NZSL coinages 


Known strategies for lexical innovation include semantic extension; coinage of new 
words through language-internal mechanisms such as derivation or compounding; 
and drawing on language-external resources, as calques or direct loans. The extent 
to which specific strategies are used and are deemed acceptable may vary according 
to the preferences of the language community (Jernudd 2013). 

Proposed COVID-19 related signs contributed to NZSL Share as well as transla- 
tional equivalents discussed by interpreters and translators in our focus groups in- 
clude examples of both language-internal and language-external lexical innovation 
strategies (see Table 2). These examples reflect processes of sign creation found in 
the NZSL lexicon generally, as evident from signs entered in the ODNZSL and from 
contributions to NZSL Share. 


Table 2: Types of lexical innovation in NZSL coinages and translational equivalents. 


Lexical innovation strategy NZSL equivalent 


NZSL Linguistic resources 


Paraphrasing (circumlocution) - MIQ (managed isolation and quarantine > STAYAHOTEL? 
- pandemic > ILL^SPREAD^WORLD 
- eliminate > COVID^STOP 


Hypernyms expanded to a list - symptoms » FEVER, SORE-THROAT, COUGH 
-  PPE» MASK, GLOVES, APRON 
- Hygiene practices > WASH-HANDS, COUGH-INTO-ELBOW 


Grammatical restructuring, -  Trans-Tasman bubble » AUSTRALIA NZ PLANES-FLY-reciprocal 
e.g., nominal referents » - transmission > PERSON^PASS-ON-multiple 
verb phrases 


Productive morphology SOCIAL-DISTANCING, QUARANTINE, ALERT-LEVELS 
(depicting/ visually motivated) 


Semantic extension — SPREAD > ‘EPIDEMIC’ 
- GROUP > ‘CLUSTER’ 


External linguistic resources 


Calques (from English) ESSENTIALAOCCUPATIONS, LOCKDOWN, COVIDAPOSITIVE, 
CLOSEACONTACT 

Loan sign (from other sign CORONAVIRUS 

languages) 


3 In this paper we follow the convention of representing lexical signs with capitalised English glosses. 
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We note that a large proportion of ‘signs’ entered in NZSL Share are actually 
phrasal (multi-sign) translations of a concept, rather than lexicalised coinages. A further 
common strategy to express novel meaning in signed languages is the use of productive 
morphology to construct complex predicates, often motivated by visual properties of 
the referent. For example, Figure 1 shows common productive constructions in which 
the upright index finger represents a person, and Figure 2 shows how the same produc- 
tive elements are used in the coinage of an equivalent for social distancing. 


— Be 


approach meet turn and walk away 


Person 


Figure 1: Complex predicates using the productive PERSON handshape. 


Figure 2: 'SOCIAL-DISTANCING'. 


The use of such strategies in relation to COVID-related lexical innovation is consistent 
with an investigation of health-related terminology in Auslan (Australian Sign Lan- 
guage), in which relatively few terms were found to have a conventional lexicalised 
form, but rather were expressed by depicting strategies (Major et al. 2012). 

Polysemy is prevalent in NZSL, and accordingly, lexical extension is used liber- 
ally for expressing new COVID-19 related meanings - by attaching a novel contex- 
tual meaning to an existing sign by mouthing the corresponding English term with 
the sign (McKee 2007). 

Although unrelated in both modality and structure to the dominant spoken lan- 
guages that surround them, signed languages are subject to constant influences aris- 
ing from close language contact. Calques from the spoken language are therefore 
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relatively common, especially for two-part terms or phrases, as reflected in existing 
NZSL dictionary entries such as open-minded.* 

Contact and borrowing between national signed languages is a common phenome- 
non. The visual-gestural production modality of signed languages means that they 
tend to share more phonological and morphological material (especially visually moti- 
vated elements) than spoken languages, which facilitates the sharing of lexicon across 
language boundaries (Quinto-Pozos/Adam 2015). Borrowing in the context of COVID-19 
is therefore consistent with a general trend for NZSL users to readily adopt vocabulary 
from other signed languages to fill lexical gaps or expand the lexicon, and online expo- 
sure to texts in other signed languages seems to be accelerating this trend (McKee/ 
McKee 2020). In the current study we identified four loans from overseas signed lan- 
guages, which were apparently acquired from foreign online sources. Chief among 
these is the sign CORONAVIRUS / COVID, which is anecdotally said to have originated 
in Japan and was widely adopted into many signed languages early in the pandemic. 


4.3 Interpreters and translators” use of lexical innovation 
strategies 


Interpreters and translators may be agents of language change by introducing and 
disseminating neologisms to the target language community through their rendi- 
tions (Lenihan 2018). The same typical lexical innovation resources discussed 
above are available as translation strategies in response to novel concepts or source 
text neologisms, or introduced into the target text as idiosyncratic usage by the in- 
terpreter/translator (Niska 1998). Which strategies are prevalent in translations is 
affected by the general trends of the target language, but may also vary according 
to individual interpreters (Van Obberghen 2016). The constraints of simultaneous 
interpreting (or short-notice translation) may also influence the use of certain strat- 
egies. For example, calques from English may be a default (but temporary) response 
when first hearing a neologism or unfamiliar term. 

Interpreters and translators in our focus group interviews demonstrated a high 
level of awareness and concern about their potential influence on NZSL language 
change. Although as mentioned above, some new coinages may be the direct result 
of the demands of working under time pressure, our research participants indicated 
that for the most part, they made conscious choices about the strategies they used. 
Primarily, they tried to avoid coining neologisms. This was largely due to the imper- 
ative to make information accessible to the Deaf community in language that would 
be readily understood, at a time when the Deaf community was still unfamiliar with 
the English term or concept and thus had no referent for new signs. For the same 


4 https://www.nzsl.nz/signs/5661 (last access: 10 June 2022). 
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reason, our research participants were wary about using calques from English such 
as COVID-positive. Thus, the demand for lexical and translational innovation driven 
by novelty in the source message was in tension with considerations of compre- 
hensibility for the target language audience — among whom health literacy is also 
lower than in the general population (Witko et al. 2017). Interpreters and translators 
reported that rather than creating new “terms”, their focus especially in early com- 
munications was to paraphrase and expand new terms with examples to maximise 
transparency and understanding. For example, describing someone as “having 
COVID’ was considered preferable to using a calque such as COVID-positive, be- 
cause ‘positive’ in NZSL is more likely to be understood in its usual sense of a ‘desir- 
able attribute/attitude” rather than the intended technical sense of being present. 

A further reason for avoiding neologisms was that the conditions of lockdown 
and time pressure to render information meant that interpreters and translators had 
limited access to Deaf community feedback with regard to their understanding and 
uptake of any such neologisms. Our research participants also reported working 
mainly in isolation with limited opportunities to discuss new terms in the source text 
with colleagues, especially at the beginning of the pandemic. As a result, transla- 
tional equivalents were variable and at times idiosyncratic, causing further concern 
that the Deaf target audience would not be able to associate these variable transla- 
tions with the new concepts and English terms. In adddition, hearing interpreters es- 
pecially were conscious of language authenticity considerations as second language 
users Of NZSL (and indeed they reported some negative comments from Deaf NZSL 
users in social media about their vocabulary choices or apparent innovations on the 
basis that they were used by non-deaf interpreters). 

Together, these concerns for comprehensibility and language authenticity may 
have predisposed our research participants to create translational equivalents using lan- 
guage-internal strategies, including semantic extension, paraphrasing, grammatical re- 
structuring (changing nominal referents to verb phrases; rendering hypernyms as a 
list), and using productive morphology to create ‘nonce’ constructions with contextual 
reference. Since similar signed language interpreting activity was occurring in many 
countries, these somewhat parallel online texts also offered a resource for browsing lexi- 
con and translational strategies, in a few cases leading to the introduction of loan signs. 


5 Discussion 
5.1 Status of COVID-19-related lexical innovation 
Although interpreters and translators had to exercise creativity to render a prolifer- 


ation of COVID-19 related terms and concepts, many of the strategies they employed 
did not lead to lexical neologisms in NZSL. While extended paraphrases were 
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progressively shortened, and some productive forms and lists of hyponyms over 
time became conventionalised translational equivalents, their status as fixed lexical 
signs or sign phrases is uncertain. This is partly a reflection of the nature of NZSL 
lexical innovation processes in general. As we noted in section 4.2, productive de- 
picting constructions convey context-specific meanings; however their reference is 
not fully specified when decontextualised. 
Examples of productive depicting constructions used in the context of COVID- 
19 terms are: 
- quarantine (indicating a fenced-off area); 
- mask (showing a mask stretching over the nose and mouth); 
- nasopharyngeal swab (showing a swab being inserted into the nose); 
- social distancing (the upright index fingers of both hands depicting persons, 
moving apart); 
— (trans-Tasman bubble — i.e. quarantine-free travel between Australia and New 
Zealand (the two hands representing planes flying in reciprocal directions). 


Many of these constructions can have a range of contextual meanings. For example, 
any fenced-off area could be described with the same construction that is used for 
quarantine, and the construction used to describe social distancing could also be 
used in the general sense of people ‘standing apart’ or ‘avoiding’ each other. The 
specific intended meanings of such constructions in relation to COVID-19 may not 
be retrievable outside of the context of the immediate translation or interpretation. 
Thus, it would be difficult to justify listing the form ‘two planes flying in reciprocal 
directions' in a dictionary with the sense trans-Tasman bubble, for example. 

Similarly, the strategy of rendering hypernyms as lists of category members 
may be context-dependent and even when largely conventionalised, such lists can- 
not be said to have fixed lexical status (Kennedy et al. 1997). 

Some terms had a lexical character, but had variable form across different indi- 
viduals and contexts of use. An example is border, which was hitherto a low fre- 
quency concept in NZSL discourse, perhaps in the absence of land borders in New 
Zealand. (Interestingly, the sign that appears in the ODNZSL? is exemplified by a 
sentence about the border between USA and Mexico, suggesting that this sign is 
seldom used with local reference.) Interpreters explained that their translations of 
border and border workers in the COVID briefing situation varied according to the 
specific referent — e.g., sea port, airport, or state line (in reference to travel restric- 
tions within Australia). When a generic term was unavoidable (e.g., a phrase such 
as border closure), they indicated a line or boundary in various ways, but doubted 
that these varying forms would become conventionalised given low frequency use 
beyond this situation. 


5 https://www.nzsl.nz/signs/480 (last access: 10 June 2022). 
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5.2 Implications for NZSL lexicography 


The lexicographical treatment of NZSL neologisms, including new coinages arising 
from the COVID-19 context, has to be considered against the background of our 
past and present lexicographical practices and the purpose and format of the 
ODNZSL. This dictionary and its precursors, as mentioned previously, used corpus 
evidence and community validation processes in the documentation of high fre- 
quency signs for general purposes. It is clear from the findings of the current study 
as well as from our ongoing lexicographical work that although similar language 
innovation processes are at work, many recent neologisms are of a different nature 
to previously documented high frequency signs. Not only are they used and recog- 
nised by much smaller subsets of the language community, but they often arise 
from interpreted or translated English material (in specialised areas) rather than 
spontaneous community usage. 

In recent years, the ODNZSL has broadened its scope and has entered a number 
of NZSL neologisms in specialist areas such as school science and mathematics cur- 
ricula, linguistics, and local place names. Expanding an existing online dictionary 
with neologisms requires changes in methodology to collect and validate data, as 
well as extensive revisions to the web application to meet diverse user needs (Ex- 
pertisecentrum Vlaamse Gebarentaal n.d.). Although broadening the scope of the 
ODNZSL has already required some procedural changes, it is likely that a number 
of core principles regarding the addition of new entries will remain unchanged. 
When we asked our research participants how we could determine which, if any, of 
the COVID-19 related translational ‘innovations’ should be entered in the dictio- 
nary, their suggestions were consistent with these core criteria: 

— The coinage should be a fixed lexical sign or sign phrase, not a one-off coin- 
age or construction that only has reference within the immediate context. 

— The coinage must have longevity and transferability - i.e., use beyond the 
context of COVID-19 briefings. Since the initial 2020 lockdown in New Zealand, 
the focus of COVID-19 reporting and discourse has continually changed and 
some of the original terms used in English and NZSL have altered or reduced in 
frequency. While such coinages might be of historical interest, the primary 
functions of the ODNZSL would not be well served by the inclusion of short- 
lived coinages that are no longer current. 

— Wider use: there must be evidence of the sign being used by the Deaf commu- 
nity, beyond just translators and interpreters. 


Very few of the terms mentioned in our findings meet these criteria. Perhaps unsur- 
prisingly, the most cited case of an established new “sign” in our data is the loan 
sign CORONAVIRUS / COVID, which now shows evidence of widespread commu- 
nity usage in New Zealand. In addition, a small number of productive depicting 
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constructions (such as MASK and NASOPHARYNGEAL-SWAB) are sufficiently lexi- 
calised to consider entering in the dictionary. 

Over time, further COVID-19 related signs may stabilise and meet these criteria, 
while other terms may fall out of use or not be taken up by the Deaf community. We 
will continue to monitor the coinages discussed in this paper as part of a wider re- 
search project investigating recent vocabulary growth in NZSL and the prevalence 
of language-internal vs. language-external factors in new sign creation. Since it is 
not possible to automatically extract relevant terms from video texts, we foresee a 
significant role for NZSL Share as a crowdsourced repository for new terms. 

Due to the circumstances in 2020 and 2021, it has not yet been possible to effec- 
tively recruit community contributors to NZSL Share. As a result, NZSL Share was of 
limited use as a tool or strategy for rapid sharing of neologisms during the first 
wave of the pandemic. The rate of new terminology quickly outstripped community 
capacity to innovate and record equivalents. In practice, the interpreters on daily 
TV briefings became the most visible daily source of new vocabulary or phraseol- 
ogy. Some coinages recorded in NZSL Share were found to be idiosyncratic and 
novel, thus were not useful to interpreters and translators to communicate to a 
wide community audience (e.g., an individual's coinage for antibody). Translators 
and interpreters reported that while they looked for vocabulary in NZSL Share and 
the ODNZSL, more frequently they referenced each other's work to standarise their 
vocabulary usage as far as possible. Thus, the process of vocabulary creation and 
dissemination became somewhat self-referential without an effective standardising 
or advisory mechanism, which was not possible to organise effectively under the 
restricted circumstances in which this process unfolded. In spite of these limita- 
tions, community reactions to NZSL Share have been very positive and uptake by 
individuals and groups (such as the national Deaf education provider) is gradually 
increasing as we continue to promote the platform. 

We note that other signed language dictionaries are grappling with similar meth- 
odological and lexicographical issues with regard to new signs. The Woordenboek 
Vlaamse Gebarentaal (Flemish Sign Language online dictionary) now includes an in- 
terface to allow for crowdsourced contributions; an expert validation committee 
meets several times a year to discuss such contributions and other neologisms identi- 
fied through linguistic research. The validation status of signs in various regions is 
marked on entries in this online dictionary, with unvalidated signs shown as “not yet 
known”. This approach allows the Flemish Sign Language dictionary to make new 
sign terminology (including COVID-19 related signs) available online quickly. 

Whilst we acknowledge the potential benefits of documenting the NZSL lexicon 
in one place, we anticipate that NZSL Share will be maintained as a separate web- 
site at present. As a separate platform, NZSL Share can include community contri- 
butions that do not (yet) meet the criteria of fixed lexical form and longevity as well 
as signs that are not typically included in a non-specialist dictionary, such as brand 
names or name signs used with the Deaf community to refer to public figures. It 
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will provide a forum for consensus-building and dissemination of new signs in the 
NZSL-using community, in the absence of a language planning body or expert com- 
mittee. The platform will also allow us to adapt our processes to include online vali- 
dation with specific groups of language users. At the same time, the ODNZSL can 
continue to be a trusted resource of a community-validated lexicon, and a consis- 
tent format can be maintained for dictionary entries, which include learner-focused 
example sentence videos as well as grammatical and user information that require 
an editorial role. 


6 Conclusion 


This case study of COVID-19 related lexical innovation in NZSL has shown that the 
main driver for new terminology has been live interpreting and translation of govern- 
ment and public health information. There has been rapid generation of new coin- 
ages in both directly COVID-19 related and adjacent fields, using both language- 
internal strategies (semantic extension, paraphrasing, grammatical restructuring, 
productive morphology) and language-external resources (calques and loans). Inter- 
preters and translators as the primary source of this lexical innovation showed a high 
level of concern for language authenticity and comprehensibility, which influenced 
the strategies they chose to render new terms and concepts into NZSL. 

Very few of the new coinages meet the criteria for being entered in the ODNZSL, 
due to the uncertain lexical status of some constructions, variable and at times idio- 
syncratic usage, and difficulties in determining dissemination and adoption of new 
signs in the wider NZSL community. 

While COVID-19 related lexical development therefore will not have an immedi- 
ate impact on the ODNZSL, this study has implications for the role(s) and format of 
the dictionary and highlights potential changes required in our lexicographic pro- 
cesses to account for the nature of NZSL neologisms. 

Although it was found to be of limited immediate use as a tool for rapid sharing 
of neologisms during the first wave of the pandemic, it is expected that the crowd- 
sourcing platform NZSL Share, launched in 2020, will facilitate collection, commu- 
nity validation and dissemination of sign neologisms. 
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Franck Sajous 

Using Wiktionary revision history to uncover 
lexical innovations related to topical events: 
Application to Covid-19 neologisms 


1 Introduction 


In April and July 2020, two extraordinary updates of the Oxford English Dictionary 
(OED) focused on the neologisms related to the Covid-19 pandemic. The responsive- 
ness of the OED was made possible by the ability of its team to monitor, analyse 
and report quickly a sudden inflow of lexical changes. This ability, while not 
unique, is not prototypical in the lexicographic landscape. Corpus lexicography ob- 
viously requires corpora, but also tools to process and query them and sufficient 
person-hours. Fulfilling these standard requirements simultaneously, however, is 
no trivial task. The tools are not an issue as far as lexical creations are concerned. 
Building a headword list is indeed not considered a “hard part of lexicography” 
(Kilgarriff 1998) and detecting formal neologisms to update a nomenclature only re- 
quires “simple maths” (Kilgarriff 2009). Identifying semantic changes is more chal- 
lenging. Clustering algorithms have been devised by Cook et al. (2013) while recent 
approaches use diachronic word embeddings (Fiser/Ljubesié 2018). These methods 
enable the detection of cultural shifts and linguistic drifts (Hamilton et al. 2016) but 
error rates are generally high. Another issue is that prediction-based models are ap- 
propriate for the detection of semantic changes over long time spans (decades or 
centuries) in very large corpora but they rarely perform well with shorter time units 
and smaller corpora (Kutuzov et al. 2018). On the corpus side, appropriate text col- 
lections to be used as input for the tools (i.e. diachronic corpora updated on a regu- 
lar basis) are — sadly — not publicly available for most languages. Lastly, corpus 
lexicography also requires substantial manpower - ideally, trained lexicographers — 
to analyse vast amounts of data in a reasonable timeframe. Most institutions how- 
ever, whether private or public, rarely have the manpower and the time they would 
like. The limitations are bound to the conditions of dictionary production rather 
than being intrinsic to corpus-based or corpus-driven approaches, as Landau (2001: 
323) explained: 
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Dictionaries are not written in a vacuum, but by people working under the pressure of time. It 
sometimes seems to me that as technology has improved the speed and power with which we 
can examine the language, the pressures to produce quickly and with fewer staff have kept 
pace, so that on balance nothing is accomplished any faster or better. The expectations of 
management seem to rise at the same rate as the speed and power of the computer increase 
[. . .] Corpora can be used well or they can be used badly. Time pressures too often push the 
lexicographer to cut corners to avoid time-consuming analyses. It really doesn't do much good 
having a good corpus with marvelous analytical tools if they aren't used. 


Time pressure and manpower are conversely not an issue in collaborative projects 
such as Wiktionary, which relies on massive online contributions performed by 
crowds of amateurs, not on corpus-driven analysis. Despite this questionable ap- 
proach to lexicography and the resulting weaknesses described, inter alia, by 
Hanks (2012) and Rundell (2017), the exhaustiveness and the responsiveness of 
Wiktionary can be leveraged to detect lexical changes. Sajous et al. (2018) showed 
how swiftly the crowds are likely to detect formal and semantic neology. For exam- 
ple, in 2017, 73% of the entries added to the OED were already recorded in Wiktion- 
ary, whose median lead time was 4 years. 

In the present contribution, I investigate if and how the English and French edi- 
tions of the Wiktionary collaborative dictionary can be used as a corpus for real time 
neology watch. This option is envisaged as a stopgap, when no satisfactory corpus is 
available. Wiktionary can also prove useful in addition to standard corpus analysis, 
to minimize the risk of overlooking new coinages and new senses. Since the collabo- 
rative dictionary’s quest for exhaustiveness makes the manual inspection of the new 
additions unreasonable (more than 31,000 English lemmas and 11,000 French lem- 
mas entered the nomenclature in 2020), identifying the possibly relevant headwords 
is an issue. The solution proposed here is to use Wiktionary revision history to detect 
the (new or existing) entries that received the greatest number of modifications. The 
underlying hypothesis is that the most heavily edited pages can help identify the vo- 
cabulary related to “hot topics”, assuming that, in 2020, the pandemic-related vocab- 
ulary ranks high. I used two measures introduced by Lih (2004), whose aim was to 
estimate the quality of Wikipedia articles: the so-called rigour (number of edits per 
page) and diversity (number of unique contributors per page). In the present study, I 
propose to adapt the rigour and diversity metrics to Wiktionary in order to identify 
the pages that generated a particular stir, rather than to estimate the quality of the 
articles. I do not subscribe to the idea that — in Wiktionary — more revisions necessar- 
ily produce quality articles (more revisions often produce complete articles). I there- 
fore adopt Lih’s notion of diversity to refer to the number of distinct contributors, but 
leave out the name rigour when it comes to the number of revisions. Wolfer and 
Müller-Spitzer (2016) used the two metrics to describe the dynamics of the German 
and English editions of Wiktionary. One of their findings was that the number of 
edits per page is correlated with corpus word frequencies. The variation in number of 
page edits should therefore reflect to some extent the variation of corpus word 
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frequencies. Renouf (2013) established a relationship between the fluctuation of word 
frequencies in a diachronic corpus and various neological processes. In particular, 
she illustrated how specific events generate sudden frequency spikes for words previ- 
ously unseen in the corpus. For instance, Eyjafjallajókull, the — existing — name of an 
Icelandic glacier, appeared in the corpus when the underlying volcano erupted in 
2010 and disrupted air traffic in Europe. In order to check if the same phenomenon 
occurs when using Wiktionary edits instead of corpus frequencies, I manually anno- 
tated the most frequently revised entries (according to various ranking scores) with 
the binary tag: “related to Covid-19” (yes/no). The annotations were then used to test 
the ability of various configurations to detect relevant headwords from the English 
and French Wiktionary, namely Covid-19 neologisms and related existing words that 
deserve updates. 


2 Methodology 


Scrutinising Wiktionary offers several opportunities for collecting Covid-related 
neologisms quite easily, depending on the language edition, and one’s ability to au- 
tomatically process the content of the dictionary. First of all, the Coronavirus cate- 
gory’ of the English Wiktionary included 52 headwords on January 1º, 2021 and 124 
in June. The English Wiktionary also has a category named Hot words newer than 
a year.” These words are described in Wiktionary as “presumably failing the criteria 
for inclusion on the spanning less than a year requirement”, but are kept, according 
to Wiktionary, “because they have become widely used in that short time”. Which 
is precisely the subject of the present study. In January 2021, the category included 
94 words, 26 of which were not English. 79% of the English words (54 out of 68) 
were related to Covid-19. This observation is encouraging in that it suggests that the 
2020 hot words are those related to the pandemic. Relying on the two categories 
mentioned is probably a good start, but by no means a satisfactory solution. First, 
some headwords that would deserve to be classified in these categories are not. Sec- 
ond, headwords that are related, but not specifically, to Covid-19, do not necessarily 
fit into these categories. Third, the goal of the present study was to develop a 
method for discovering topical neologisms that can be adapted to other language 
editions and other topics. In the French Wiktionary, there is no such thing as a “co- 
ronavirus” or a “hot words” category, and such categories will not make it possible 
to discover neologisms related to other “hot topics” in the future. Looking for some 
patterns (covid, corona, etc.) in the headword list, the definitions and the usage 


1 https://en.wiktionary.org/wiki/Category:en:Coronavirus 
2 https://en.wiktionary.org/wiki/Category:Hot words newer than a year The page also contains a 
link Hot words older than a year, to which some of the 2020 hot words have been moved. 
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examples help to harvest relevant headwords (e.g. covidiot, covid party, coronascep- 
tic, coronaviruslike, etc., and long-hauler, defined as 'a COVID-19 patient who is 
suffering from [. . .]’). However, the method fails to retrieve words that are not mor- 
phologically derived from the patterns and that are related but not specific to the 
pandemic, i.e. headwords whose defining words do not match such patterns (e.g. 
no word matches the patterns in the definition and usage examples of social dis- 
tancing). Wiktionary revision logs are the same for all language editions and the 
number of editions/editors per page can be extracted regardless of any target topic. 
Details on the processing required to exploit the logs are given in Section 2.1. 


2.1 Data processing 


The history dump of Wiktionary is a large file released on a regular basis, which con- 

tains every version of all articles, stored after each individual contributor's edition. 

For each revision, the username of the contributor, or the IP address (for unregistered 

users) is provided, as well as the revision date. The files released on January 1, 2021 

were downloaded? and processed for the English and French editions of Wiktionary 

so as to extract, for each month and for each article, the number of revisions and the 
number of unique contributors.* Several pre-processing steps were performed to dis- 
card data irrelevant to the present work: 

- discussion pages, user talk pages, etc. 

- parts of speech other than common noun and proper noun, verb, adjective and 
adverb. 

— pages related only to inflected forms or entries in a language different from the 
target language (e.g. the page of the English Wiktionary which describes the 
Vietnamese headword siéu vi corona ‘coronavirus’ was ignored). 

— revisions related only to other language sections. For example, the revision 
dated 5 November 2020 on the coronavirus page of the English Wiktionary, that 
resulted in the addition of the derived term coronaviraal to the Dutch language 
section was ignored. 


After discarding irrelevant pages and revisions, more than 14 million revisions re- 
mained for the English Wiktionary and more than 27 million for the French lan- 
guage edition. 

Studies that focus on Wiktionary (or Wikipedia) revisions differ in whether they 
take into account the revisions performed by bots and by anonymous users. A bot is 


3 https: //dumps.wikimedia.org/ 
4 The computing is an extension of the work done by Sajous et al. (2020) to produce WIND, a re- 
source which contains the dates of inclusion of Wiktionary headwords. 
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a program devised to automatically perform specific types of revision targeting a 
range of articles (mainly formatting, importing audio files, etc.). Such automatic ed- 
itions amount to 45% of the revisions in the English Wiktionary and 62% in the 
French edition. A contributor to Wiktionary may be identified by a registered ac- 
count or an IP address. Regular contributors generally create an account while oc- 
casional contributors may edit an article “anonymously”. Dismissing or taking into 
account anonymous revisions (which represent 7.4% of the revisions in the English 
Wiktionary and 4.7% in the French Wiktionary) is a matter of debate. They are dis- 
carded in some studies on the grounds that anonymous users are less experienced 
or trustworthy than registered users and that identifying Internet users by their IP 
addresses is a rough approximation. Several contributors can indeed use the same 
IP,’ while a given contributor can use several IPs,° as was the case during the pres- 
ent study (see the discussion on myroblyte in Section 3.1). The objection could how- 
ever apply to registered accounts too: different contributors sharing the same 
account is probably an exception, but it happens that a single user owns several 
accounts. Since the present study is concerned with the tendency of articles to be 
revised many times by possibly many people (not only experienced or reputable 
Wiktionarians), I was tempted to argue that there is no a priori reason for ignoring 
anonymous contributions (while revisions performed by bots are not relevant). The 
quantitative investigations presented in Section 3.1 show that there is no definitive 
answer as to whether considering or ignoring anonymous contributions is the best 
option. Regarding qualitative considerations, discarding anonymous contributions 
poses a risk of overlooking relevant words. For instance, in 2020, the articles for the 
synonyms RO, basic reproduction number and basic reproduction ratio were created 
from the same IP address. Whatever their rankings, these words would have gone 
unnoticed if anonymous contributions had been ignored. 

Regarding existing headwords (i.e. those created prior to 2020), it is not difficult 
to detect new senses added to Wiktionary in 2020 by using the revision log, but the 
main focus in the present study is on any kind of updates: additions, modifications 
or replacements of definitions, usage examples, semantic relationships, transla- 
tions, usage notes, etc. Beyond new meanings, such revisions may indicate the 
need for article reviews (cf. the examples of ventilator in Section 2.4 and comorbid- 
ity, Section 3.5), which is information that lexicographers may find useful. 


5 Especially contributors accessing the Internet from behind an institution firewall. 
6 Either intentionally, to deliberately mask one’s identity, or unintentionally, as a result of dy- 
namic IP assigning. 
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2.2 Ranking the new headwords 


Looking for the new articles that have the maximum number of revisions and con- 
tributors should make it possible to detect headwords related to topical events, no- 
tably the Covid-19 pandemic. Figure 1 illustrates the cases of social distancing and 
flatten the curve. Dotted lines correspond to the number of revisions/contributors 
per month, while plain lines represent the total number of revisions/contributors 
since the creation of the articles. 


ie revisions = revisions total wii revisions = —— revisions total 
= $ = contributors —— contributors total = $ = contributors —— contributors total 


(a) social distancing (b) flatten the curve 


Figure 1: Number of revisions and contributors for social distancing and flatten the curve. 


The lines for the two words follow the same pattern: a sudden spike of activity 
when the page is created (typically, when a word comes into usage) and an off- 
peak period, with occasional contributions (this revision pattern is reminiscent of 
the pattern described by Renouf (2013) concerning the corpus frequency of Eyjafjal- 
lajókull, as mentioned in the Introduction). An analysis based on a one-year span is 
likely to detect the two words social distancing and flatten the curve, with the former 
ranking higher (note that the vertical scales in Figure 1 are different). Conversely, 
words such as, for example, cognitive bias, added in April 2020, which received 7 
contributions by 4 distinct human contributors are unlikely to be detected. In addi- 
tion, the two Covid-related words are likely to appear in the first trimester candidate 
list when performing quarterly analyses. 

Wiktionary headwords can be represented in a coordinate system whose axes 
correspond to the number of revisions and unique contributors of each headword. 
The 31,107 new headwords added to the English Wiktionary in 2020 are depicted by a 
scatterplot in Figure 2. The article COVID-19 was modified 115 times by 63 unique con- 
tributors and is therefore represented by the coordinate point (115, 63). A given coor- 
dinate point can correspond to several headwords. For example, 17,415 words have 
not been modified since their creation. They are all represented by the coordinate 
point (1, 1). A less extreme case, the headwords self-isolate and Wuhan coronavirus 
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Figure 2: Distribution of the 31,107 lemmas added to the English Wiktionary in 2020. 


were each modified 16 times by the same number (11) of distinct contributors. They 
are therefore represented by the same coordinate point (16, 11). All the points are 
located along or below the diagonal line (i.e. the contributors=revisions line) because 
there obviously cannot be more contributors than contributions for a given head- 
word. The points along the diagonal line are those for which each revision was 
made by a distinct contributor. For instance, the words coronoia, Wuhan shake and 
Zoombombing were modified 8, 7 and 4 times, respectively, each time by a different 
contributor. 

This kind of diagram enables a geometrical interpretation of the headwords” lo- 
cation. The rightmost points of the diagram (i.e. those with the highest abscissa) are 
those corresponding to the most heavily revised pages. The upmost points (i.e. those 
with the highest ordinates) are those corresponding to the headwords edited by the 
greatest diversity of contributors. Given two points having the same abscissa, the up- 
most point corresponds to the headword revised by a greater diversity of contribu- 
tors. For example, the two headwords flatten the curve and Medusavirus “a virus that 
infects amoeba’ were each modified 30 times in 2020, and have a similar creation 
date (February and March, respectively). However, the 30 edits of flatten the curve 
were made by 23 distinct contributors, compared to 8 contributors for Medusavirus. 

Four ranking scores were tested to detect potential Covid-related neologisms. 
Given a headword h, the ranking scores are defined as follows: 

1. revs, = raw number of revisions for h 
2. contribs, = raw number of distinct contributors for h 
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3. dist, = distance from the origin of the coordinate system to the point (revsy, 
contribs,) = \/revs,2 + contribsy? 

4. prod; = product of the number of revisions and the number of contributors” 
=revs„Xcontribsn 


The geometrical interpretation of the first three ranking scores is depicted in Figure 3 
(the product score has no geometrical interpretation). The revisions-based score or- 
ders the headwords from right to left. When two headwords have the same abscissa 
(i.e. the same number of revisions), they are ordered by their ordinate value (their 
number of contributors), i.e. the upmost headword is ranked first. For instance, the 
initially equally ranked (4' position) Wuhan pneumonia and Mount Mayon (a vol- 
cano in the Philippines), whose coordinates are (44, 20) and (44, 4), are finally 
ranked fourth and fifth. Similarly, the contributors-based score orders the head- 
words from top to bottom. When two headwords have the same ordinate (i.e. the 
same number of contributors), they are ordered by their abscissa value (i.e. their 
number of revisions), i.e. the rightmost headword is ranked first. For instance, the 
equally ranked (4% position) myroblyte (see Section 3.1) and flatten the curve, whose 
coordinates are (72, 23) and (30, 23), are finally ranked fourth and fifth, which, in 
this case, is not the best option. Finally, the distance-based score orders the head- 
words according to their remoteness from the origin of the coordinate system. 


2.3 Ranking the existing headwords 


The scores introduced in Section 2.2, devised to rank Wiktionary new entries, are 
based on raw numbers of revisions and contributors. Using raw numbers to rank 
existing entries would not make any sense. We can indeed expect the revision rate 
of Wiktionary articles to depend on the nature of the entry, i.e. whether it is a fre- 
quent or a rare word, polysemous or monosemous, belonging to a specialised field 
or to the general language (knowing that these characteristics are related). For ex- 
ample, the larger spike observed in 2020 for coronavirus when compared to that ob- 
served for virus in Figure 4(a) is all the more noticeable as the article corresponding 
to the frequent and polysemous word virus is regularly revised, while the entry co- 
ronavirus rarely is. Another telling example is the number of revisions of masks, 
facemask and surgical mask, as depicted in Figure 4(b). If we consider the 2020 pe- 
riod globally, the three articles received a similar number of revisions (36, 36 and 
34, respectively). However, their “usual” yearly revision values are very different. 


7 The product can be normalised to values between O and 1 by dividing the score by the maximum 
number of revisions and the maximum number of contributors. Normalising the product, however, 
is useless since it does not change the ranking order. 
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Another way to uncover unusual increases in the number of editions is to repre- 
sent the total number of revisions (or contributors) for a given headword, as depicted 
in the time-graphs in Figure 5. Figure 5(a) shows that revisions performed over sev- 
eral consecutive months may result in jumps that can be observed for the resulting 
period. Figure 5(b) shows the total number of revisions for mask, facemask and surgi- 
cal mask. The increase in the number of revisions is in line with the usual trend for 
mask, while the increases for facemask and surgical mask are more noticeable. 


— virus ee MASK 
16 | —— facemask 


60 (= = = coronavirus 


= = = surgical mask 
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(a) virus and coronavirus (b) mask, facemask and surgical mask 


Figure 4: Monthly revision frequencies in the English Wiktionary. 
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(a) monthly and total revisions for comorbidity (b) total number of revisions for 
mask, facemask and surgical mask 


Figure 5: Monthly and total number of revisions in the English Wiktionary. 


The boxplot in Figure 6 statistically confirms these observations: With respective 
mean values of 3.1, 5.9 and 16.7 (median values of 0, 2 and 15), facemask was re- 
vised 12 times more than usual, surgical mask 5.8 times more and mask only 2.2 
times more. The two upmost circles in the figure (which represent extreme values) 
correspond to the 2020 number of revisions for facemask and surgical mask (an- 
other extreme value, observed for facemask, correspond to the 7 revisions made 
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Figure 6: Distribution of the yearly number of revisions in the English Wiktionary. 
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Figure 7: Number of revisions for coronavirus in the English and French Wiktionary. 


in November 2005 when the article was created). Conversely, the 2020 value for 
mask is not identified as an extreme value. 

Regardless of the linguistic characteristics of the headwords, the revision rate 
may differ from one language edition to another. For example, the evolution of the 
number of revisions for the article coronavirus follows a similar trend in the English 
and French Wiktionary, but with different magnitudes (cf. Figure 7). 
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Detecting a particular “stir” around a headword is like looking for the extreme 
values of the boxplot in Figure 6. It therefore requires comparing the number of revi- 
sions over a given period to its usual revision rate, just as extracting keywords by 
comparing a focus corpus to a reference corpus requires the use of relative frequen- 
cies, not raw frequencies. The scores used to detect the most unusually revised ar- 
ticles compare the number of revisions (or contributors) over a given period to the 
usual (mean or median) number of revisions (or contributors) over similar time 
spans. Given a target period p and headword h, the scores are calculated as follows: 
| 1+revsp(h) 
~ 1+ avg(revs(h)) 

E 1+revs,(h) 
~ 1+ median(revs(h)) 
1+ contribs, (h) 
1+ avg(contribs(h)) 
E 14 contribs, (h) 
~ 1+ median(contribs(h)) 


1. avgRevsRatiop(h) 


2. medianRevsRatio,(h) 


3. avgContribsRatio,(h) = 


4. medianContribsRatiop(h) 


Medians and averages are calculated over the period that spans from the creation of 
the article corresponding to the headword h to the month before period p begins. A 
constant (here, 1) is added to the denominator (and to the numerator, for balance) 
so as to avoid divisions by zero. The median value can be null (as we saw above 
with facemask), but the average value should not be, as all the articles have been 
edited at least once (when they were created). However, certain revisions (per- 
formed by bots or anonymous contributors) are discarded in some of the experi- 
ments described below, which makes the addition of a constant necessary. 

Ranking the headwords according to the slope of the curve for a given period 
was tempting. The slope accounts for the increase in the number of revisions (or 
contributors) over a given time span. For both the English and French language edi- 
tions, the scores based on slope values performed poorly. As the slope is equal to 
the ratio between the number of revisions (or contributors) and the length of the 
time span, its value is proportional to the raw number of revisions (or contributors), 
and disregards the corresponding usual amount, which explains the low results. 
Figure 8, which consists of two enlargements of Figure 5(b), illustrates the situation 
for mask and facemask. Although it is clearly visible in Figure 5(b) that the two 
words have different usual revision rates, Figure 8 shows that their slope values on 
the 2020 period are the same. The slope-based score was therefore abandoned and 
is not further discussed. 
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Figure 8: Headwords with similar slope values over the year 2020. 


2.4 Annotation of headwords 


In order to assess the performances of the ranking scores, several sets of top-ranked 
headwords were annotated for each metric introduced in Section 2.2 with the binary 
flag ‘related to Covid-19’ (yes/no). New headwords were annotated to detect true 
formal neologisms, or words that already existed but were too rare or too special- 
ised to enter a dictionary before. Existing headwords were annotated to detect se- 
mantic neology or articles that potentially deserve an update. The interpretation of 
the “relatedness to Covid-19” criterion encompasses words whose referents are in a 
direct or indirect relationship with the virus and the disease, medical care, control- 
ling the spread of the pandemic, statistical analysis, consequences of the pandemic 
on professional activities and social lives, as well as humorous coinages. For the 
English and French language editions of Wiktionary, I annotated, for each data 
source,? and each relevant ranking score (cf. Sections 2.2 and 2.3): 

— the first 200 new headwords, ranked over the whole 2020 period; 

— the first 200 existing headwords, ranked over the whole 2020 period; 

— the first 100 new headwords, ranked over each trimester; 

— the first 100 existing headwords, ranked over each trimester. 


The different (overlapping) sets of headwords represented a total of 3,070 English 
and 3,168 French words to be annotated. The words were stored in four groups of 
spreadsheets, setting apart new and existing entries, English and French words. 
Each word was accompanied by a hyperlink to the online article, along with the 
definition of the first sense as it stood in Wiktionary. In most cases, the annotation 


8 Data sources are discussed in Section 3.1. 
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was rather self-evident.? Conversely, some headwords required further investiga- 
tion, e.g. reading the definition or looking for additional encyclopaedic knowledge. 
The definition taken from Wiktionary was intended to help annotate new entries 
rather than existing, polysemous ones, that require a look at the online article (and, 
sometimes, at the differences between the versions of articles before/after 2020) or 
other sources. Encyclopaedic knowledge was especially necessary for annotating 
words related to the fields of pathology and pharmacology. For example, two drugs 
related to hydroxychloroquine were positively annotated: pamaquine (existing in 
the English Wiktionary since 2009) and quinium (added to the French Wiktionary in 
2020). Revisions of entries denoting other drugs may have been motivated by pan- 
demic-related reasons, such as drugs used in the treatment of respiratory diseases 
(e.g. bambuterol, used in the treatment of asthma). However, in the absence of clear 
evidence of a relation with Covid-19, such headwords were negatively annotated. 
On the same lines, extractor fan may be related to air purification that helps prevent 
the spread of the virus. However, this entry, created in June 2019, makes no refer- 
ence to such a meaning. It was deemed too general and was therefore negatively 
annotated. Conversely, ventilator was annotated positively. Although not related to 
the Covid pandemic when used as a synonym of fan, the 2020 updates clearly target 
the medical ventilator sense. The previous synonymic definition (medicine) A respi- 
rator’ was changed to (medicine) A machine that moves breathable air into and out 
of the lungs of a patient who is unable to breathe sufficiently”, with respirator now 
appearing as a hypernym. A picture of a medical ventilator has been added, as well 
as the derived term tank ventilator and numerous translations. 

In the case of French borrowings from English, the prior annotation of the En- 
glish word helped. For instance, it would have been hard to come to a decision in 
the case of the neologism doomscrolling (informal) The practice of continually read- 
ing Internet news about catastrophic events” retrieved from the English Wiktionary 
by only reading its definition. The definition may refer to one’s state of astonish- 
ment when following the news after the pandemic outbreak or when the first lock- 
down was decided. But catastrophic events can, sadly, designate numerous other 
facts. The problem was finally easily solved, due to the presence of the Coronavirus 
category at the bottom of the article. In the French Wiktionary, the article doomsc- 
rolling, which mentions the borrowing from English, but does not mention the pan- 
demic in the definition or the usage examples, is devoid of any topical category. 
The previous annotation of the English word led to a positive annotation. In the 
French Wiktionary, the Anglicism contact tracing was added in April 2020, and its 
annotation did not raise any difficulty. Conversely, tracking was debatable. Until 


9 Given the number of regionalisms, occasionalisms and dated words, in addition to words of dif- 
ferent subcultures, a quick look at the definition was necessary, even for French words. Self-evident 
therefore means non-ambiguous here, rather than immediate. 
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2020, the corresponding article described the English gerund. In April 2020, the de- 
scription of the French Anglicism tracking was added and defined as surveillance de 
masse des populations par pistage de tous les citoyens ‘mass surveillance of popula- 
tions by tracking all citizens”. Though the definition does not refer to controlling 
the spread of the pandemic, the three usage examples are related to this goal, in 
particular to the use of cell phones for contact tracing which was then a matter of 
debate, echoing discussions on other freedom-destroying laws. Knowledge of cur- 
rent events helped annotate the headword positively. In the English Wiktionary, 
pastette was only described as the plural form of the Italian pastetta, a variant of 
pastella ‘batter’ until 2020. The English noun was added in 2020 and described as a 
synonym of ‘Pasteur pipette’. Although the use of this instrument is not specific to 
blood sample collection for Covid-19 testing, the addition of the entry to the dictio- 
nary is obviously related to the pandemic and the headword was therefore posi- 
tively annotated. 

Some additions and some words already in the dictionary refer to things of the 
past. For example, méthode Raspail entered the French Wiktionary in March 2020 
and refers to a hygiene system named after its creator Francois Vincent Raspail, 
mainly based on handwashing and dating back to the nineteenth century. Despite 
the lack of exclusive connection to Covid-19, the 2020 addition of this old preven- 
tive measure, simultaneously with the revisions of gel hydroalcoolique ‘alcogel’ and 
the addition of geste barriere “practice intended to avoid the spread of a virus” ar- 
gued in favour of a positive annotation. Although the 7 revisions of quarantine flag 
(in the nautical field, the flag that was hoisted by a ship to signal that it had conta- 
gious disease aboard) by 5 distinct contributors to the English Wiktionary, which 
resulted in a rewording of the definition and the addition of four translations and a 
reference, are striking, the word was deemed too indirectly related to the pandemic 
to be assigned a positive annotation (the flag is said to have been formerly hoisted 
and the reference dates back to 1916). 

Lastly, some words were close to being given a positive annotation they did not 
deserve. With the videoconferencing software in mind, it was tempting to annotate 
positively the French zoom and zoomer, and the English zoomer, without checking 
the corresponding definitions. However, the French words are only related to the 
camera lens and the revisions of the English zoomer are related to the generational 
designation (active boomer, member of Generation Z).'º The British slang lurgy, denot- 
ing a fictitious, or uncategorised, infectious disease with cold or flu-like symptoms, 
that renders one unable to work, was a good candidate. However, several occurrences 
found in newspaper articles dating back to Fall 2019 (i.e. before times), whose topic 


10 Derivatives of Zoom (the videoconferencing software) retain the initial upper case in English, 
according to the English Wiktionary. 
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was the ironically named season “of dreaded lurgy” in reference to people seeking 
excuses for work absenteeism, led to the word being rejected. 

The scatterplot in Figure 9 depicts, for the English and the French Wiktionary, 
the Covid-related words in blue circles and the negatively annotated words in yel- 
low circles. Grey triangles correspond to the superposition of several words, some 
of which are related and others not related. Empty circles close to the origin of the 
coordinate system correspond to words that were not annotated because they did 
not rank high enough in any configuration. 

Simple linear regressions were performed for the two language editions and the 
red lines represent the models fitting the distributions. A first observation is that 
the points corresponding to positively annotated headwords are mostly above the 
regression line. Given several articles that received the same number of revisions, 
the articles related to the Covid pandemic are those edited by the largest number of 
contributors. This finding is further investigated in Section 3.2. 
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Figure 9: Distribution of the headwords added to Wiktionary in 2020 with respect to revisions and 
contributors. 


3 Results 
3.1 Contributor types 
The performances of the ranking scores were calculated from different data sources in 


order to evaluate which kinds of revisions were worth taking into account with respect 
to the contributor types (cf. Section 2.1). Figures 10(a) to 10(d) show the results 
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obtained when considering all revisions compared to the results obtained when 
ignoring revisions performed by bots and/or by anonymous contributors. Fig- 
ures 10(a) and 10(b) correspond to the English Wiktionary while Figures 10(c) and 10(d) 
correspond to the French Wiktionary. For the two language editions, the results ob- 
tained with the ranking score based on the number of revisions are visible on the 
left-hand side, i.e. in Figures 10(a) and 10(c). The results obtained with the ranking 
score based on the number of contributors are visible on the right-hand side, i.e. in 
Figures 10(b) and 10(d). The line chart shows the percentage of new entries related 
to the pandemic on the ordinate, as a function of the number of candidates exam- 
ined (ranging from 1 to 200), on the abscissa. 

Regardless of the data sources, the results are better for the English Wiktionary 
than for the French edition and the ranking score based on the number of contributors 
performs better than that based on the number of revisions. These observations are 
further discussed in Section 3.2. The experiments confirm that discarding the revisions 
performed by bots generally improves the results. Regarding anonymous contribu- 
tions, conclusions are mitigated. Discarding these contributions significantly lowers 
the results obtained with the French Wiktionary. It also lowers the results obtained 
with the English Wiktionary when using the ranking score based on the number of 
revisions, but it improves those based on the number of contributors. In Figure 10(b), 
a clear advantage is visible in the top of the list, up to rank 18. The first downshift 
observed in this figure for the data involving anonymous contributors is due to the 
headword myroblyte ‘a saint whose relics or place of burial produce or are said to 
have produced the Oil of Saints”. With 71 revisions coming from 22 IP addresses, the 
headword reaches the third rank. A closer look revealed that the addresses were most 
likely assigned to the same machine." 

Given that the ranking score based on the number of contributors outperforms 
the ranking score based on the number of revisions whoever the authors of the revi- 
sions, the experiments described in Section 3.2 were performed with the source of 
data that produced the best results with the contributors-based score, for each lan- 
guage edition, i.e. the “no bots, no anonymous” option for the English Wiktionary 
and “no bots, with anonymous” for the French edition. The best choice regarding 
data sources, however, is unstable. The experiments conducted for each trimester 
led to better results for the English Wiktionary when the anonymous contributions 
were taken into account. The results presented in Section 3.3 were produced with 
the “no bots, with anonymous” option. 


11 All of them have the same two left numbers, and are probably due to dynamic IP assigning. A 
comparable number of revisions stemming from the same addresses is observed for the same head- 
word in the French Wiktionary. 
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Figure 10: Influence of the contributor types on the ranking scores. 


3.2 Yearly ranking of the new headwords 


The more up-to-date a dictionary is, and the more exhaustive its list of headwords, 
the more likely a new headword is to be a neologism. The top-ranked new additions 
to Wiktionary, according to the metrics introduced in Section 2.2, were therefore in- 
spected to detect formal neologisms. Table 1 reports, for the English edition of Wik- 
tionary, the 20 most heavily edited new entries in 2020, sorted by number of 
revisions, number of unique contributors, and by the two combinations (product 
and distance) introduced in Section 2. Grey cells indicate headwords that are not 
related to the pandemic, while the others are. 

When ordered by number of revisions, less than half of the top-ranked entries 
(9 out of 20) are related to the pandemic. When ordered by number of contributors, 
90% of them (18 out of 20) are positively annotated. This result seems to confirm 
the initial hypothesis: the most frequently edited Wiktionary pages, especially 
pages edited by many distinct contributors, can help detect topical neologisms. 
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Looking down the list after the 20°" rank of the contributors-based score helps de- 
tect the following relevant words:!2 fever clinic (21), self-isolate (27), coronoia (29), 
before times (38), doomscrolling (41), Wuhan shake (42), maskne (43), SARS-CoV 
(48), rat-licker (54), contact trace (58), maskhole (61), elbow bump (74), mascne (88), 
plandemic (94), China virus (96), Covidtide (103), corona virus (134), case fatality 
rate (144), coronasceptic (180), elbow shake (182), antimasker (238), covid-19 party 
(250), long-hauler (272), corona belly (484), community spread (492), etc. 


Table 1: 20 most frequently edited pages in the English Wiktionary 2020 additions, according to 


different ranking scores (data source: no bots, no anonymous). 


Rank # Revisions # Contributors Product Distance 

1 COVID-19 COVID-19 COVID-19 COVID-19 

2 social social distancing social distancing social distancing 
distancing 


3 Mount Mayon 


covidiot 


Wuhan pneumonia 


Mount Mayon 


4 Wuhan flatten the curve Wuhan virus Wuhan pneumonia 
pneumonia 
5 Wuhan virus Wuhan virus covidiot Wuhan virus 


6 Bicol Region 


Wuhan pneumonia 


flatten the curve 


Chinese virus 


7 Peja covid Chinese virus Bicol Region 
8 Chinese virus social distance covid Peja 
9 Berat SARS-Co V-2 infectious disease covidiot 


specialist 


10 Medusavirus 


Chinese virus 


social distance 


flatten the curve 


11 Vloré infectious disease SARS-CoV-2 Berat 
specialist 
12 Kizilsu contact tracing contact tracing Medusavirus 
13 covidiot self-isolation Kung Flu covid 
14 infectious disease Kung Flu Wuhan flu infectious disease 
specialist specialist 
15 flatten the Wuhan flu Medusavirus Vlorê 


curve 


12 Whether such neologisms should be added to a dictionary headword list is not the focus of the 
present research and depends on the editorial policy. The final decision is up to the lexicographer, 
and is not discussed here. 
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Table 1 (continued) 


Rank # Revisions # Contributors Product Distance 

16 covid COVID self-isolation Kizilsu 

17 world map Spleef world map world map 

18 Accompong Trumpster fire Bicol Region social distance 
19 Arlberg self-quarantine Mount Mayon SARS-CoV-2 
20 sinoatrial node Wuhan coronavirus wokefish contact tracing 


The same ranking score applied to the French Wiktionary provides the following 
Covid-related words: Covid-19 (1), covid (2), COVID-19 (3), covidiot (5), déconfinement 
‘deconfinement, lockdown removal’ (6), distanciation sociale “social distancing’ (8), 
covide “sick from Covid-19’ (9), déconfiner ‘deconfine, remove lockdown’ (10), recon- 
finement ‘reconfinement, new lockdown’ (12), masque chirurgical ‘surgical mask’ (18), 
coronavirus 2 du syndrome respiratoire aigu sévére ‘SARS-CoV-2’ (19), méthode Raspail 
‘hygiene system based on handwashing’ (21), télétravaillable ‘(work) that can be 
done by teleworking’ (25), covidien ‘related to, or sick from Covid-19’ (27), cas contact 
‘contact case’ (31), distanciation physique ‘physical distancing’ (34), gel hydroalcooli- 
que ‘alcogel’ (35), doomscrolling (49), infodémie ‘infodemics’ (53), hydroxychloroquine 
(58), Covid (60), pneumonie de Wuhan ‘Wuhan pneumonia’ (67), syndémie ‘syndemic’ 
(121), antimasque ‘antimask’ (124), démerdentiel ‘(informal) activity performed with 
the means available’ (133), coronasceptique ‘coronasceptic, who denies the reality or 
the aftermath of the coronavirus’ (136), autoconfinement ‘self-isolation’ (142), Covid 
positif ‘Covid positive’ (156), coronapiste ‘temporary cycle lane built during the Covid- 
19 pandemic’ (186), raoultiste ‘supporter of Pr. Raoult’ (214), candidat-vaccin ‘vaccine 
candidate’ (308), coronavirussé ‘sick from Covid-19’ (361), tempéte immunitaire ‘cyto- 
kine storm’ (432), etc. 

The performances of the different ranking scores are further illustrated in Figures 11 
and 12 for the English and French language editions. For both languages and for the 
four ranking scores, the line charts are similar to those in Section 3.1 and show the 
percentage of new headwords that are related to the pandemic on the ordinate, as a 
function of the number of candidates considered (ranging from 1 to 200) on the 
abscissa. 

The same observation can be made for both language editions: the ranking 
based on the number of unique contributors performs markedly better than the 
ranking based on the number of revisions. The number of contributors alone even 
outperforms the combinations of the two measures (with the “product” score only 
slightly improving the results locally from ranks 115 to 179 and equalling the results 
of the contributors-based score from rank 187 onwards). 
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Figure 11: Performance of the ranking scores for the English Wiktionary new headwords. 
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Figure 12: Performance of the ranking scores for the French Wiktionary new headwords. 


In order to explain the lower results obtained with the French language edition, 
a simple linear regression was performed, as depicted in Figure 9 (Section 2.4), where 
the regression lines appear in blue. Linear regressions are usually performed to con- 
firm that two variables are significantly related. In the case of the number of revisions 
and contributors, we already know that this is the case, but we are interested in the 
regression coefficients. With a value of 0.43,” the slope of the regression line for the 
English Wiktionary is greater than that for the French edition (slope value of 0.34). 
This means that, given two articles selected at random in the English and French edi- 
tions, that have the same number of revisions, the article from the English Wiktionary 
is likely to have been modified by a greater number of contributors than that in the 
French Wiktionary. This finding, combined with the better results obtained for the 
English language edition, is an argument in favour of the relevance of the “diversity” 


13 F(1, 31104) = 6.64e**, p-value < 0.001. 
14 F(1, 11619) = 1.545e**, p-value < 0.001. 
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measure. To go further in this direction, the coordinates of the positively and nega- 
tively annotated headwords originating from the English Wiktionary were set apart in 
two distinct scatterplots, as shown in Figure 13. 
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Figure 13: Distribution and regression lines for the related vs unrelated English headwords. 


For each distribution, a simple linear regression was performed. The slope coeffi- 
cients are 0.56” for the Covid-related words and 0.15'* for the words that are not 
related (regression lines are depicted in red in Figure 13). This means that, given a 
number of revisions, a positively annotated headword is likely to have been revised 
by a larger number of contributors than a negatively annotated headword with the 
same number of revisions. 

For each headword of the two annotation sets, the ratio between the number of 
contributors and the number of revisions was calculated. This ratio can be understood 
as the “local” diversity for individual headwords: diversity(h) = contributors(h) / revi- 
sions(h). Its maximum value is 1 when all the revisions were made by distinct contrib- 
utors, i.e. when contributors(h) = revisions(h). The ratio is low when all the revisions 
were made by the same contributor, and especially when the number of revisions is 
high. The boxplots in Figure 14 represent variations of this ratio. Figure 14(a) shows 
the difference in diversity between related and unrelated headwords in the English 
Wiktionary. The median value is 0.62 for the positively annotated words and 0.56 for 
the words annotated negatively. A Welch two-sample t-test shows that the difference 


15 F(1, 51) = 730.2, p-value < 0.001. 
16 F(1, 633) = 119.7, p-value < 0.001. 
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is statistically significant.” The same experiment was conducted on the French Wik- 


tionary. For this language edition, the diversity is also higher for the positively anno- 
tated words than for words annotated negatively. This time, however, the difference 
was not statistically significant. 

To conclude on the importance of diversity, we compared the diversity ratio for 
the 1598 annotated new headwords (688 originating from the English Wiktionary 
and 910 from the French Wiktionary), regardless of the annotation. The variation in 
diversity is depicted in Figure 14(b). The diversity is greater in the English Wiktion- 
ary (median value of 0.57 compared to 0.5 for the French Wiktionary) and the differ- 
ence is statistically significant.'® Once again, this result, together with the lower 
performances observed for the French Wiktionary, drives home the importance of 
the diversity measure. 
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Figure 14: Variation in the diversity ratio depending on annotations and the language editions. 


3.3 Quarterly ranking of the new headwords 


The method proposed above is based on the analysis of Wiktionary revisions on a 
whole year basis. However, retrospectively identifying neologisms one year after 
the Covid-19 outbreak (and after lists of neologisms have proliferated) might seem 
like making a weather forecast for the day before. For dictionaries such as the 


17 t(71) = -2.0905, p-value < 0.05. 
18 Welch two-sample t-test: t(1423) = 3.7912, p-value < 0.001. 
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French Petit Robert and Petit Larousse, which are updated once a year, the method 
based on yearly analyses makes sense.? However, some online dictionaries are updated 
more or less continuously and, among them, the OED is ordinarily updated quarterly. 
In order to assess the validity of the proposed method for updating such dictionaries 
under conditions closer to reality, I examined what would have been the top-ranked en- 
tries by the end of the four 2020 trimesters (thereby mimicking the quarterly updates 
that usually occur in the OED). The number of pandemic-related neologisms that would 
have been detected among the first 100 candidate headwords by the end of each trimes- 
ter, according to the contributors-based score, is reported in Table 2. 


Table 2: Number of Covid-related additions, depending on the number of candidates inspected. 


# candidates English Wiktionary French Wiktionary 

T1 T2 T3 T4 T1 T2 T3 T4 
5 4 5 2 2 2 4 0 3 
10 8 9 4 3 3 6 1 4 
25 12 12 12 6 3 9 4 6 
50 18 14 14 8 3 11 6 8 


100 20 18 16 13 4 14 9 16 


The relevant words retrieved from the English Wiktionary are listed below. The head- 

words in regular type font are those which were already detected during the previ- 

ous trimesters while headwords in boldface indicate previously undetected words:?° 

— Ti: social distancing, COVID-19, Wuhan pneumonia, flatten the curve, 
Wuhan coronavirus, SARS-CoV-2, Kung Flu, Wuhan virus, COVID, social dis- 
tance, Wuhan flu, SARS-CoV, case fatality rate, Chinese virus, self-isolate, 
covid, community spread, self-isolation, self-quarantine, noncoronaviral 

- T2: COVID-19, social distancing, infectious disease specialist, covidiot, SARS- 
CoV-2, flatten the curve, Wuhan virus, self-isolation, COVID, Chinese virus, social 
distance, corona virus, self-quarantine, elbow shake, Kung Flu, SARS-CoV, Rona 

- T3: COVID-19, covid, social distancing, maskne, mascne, plandemic, Chinese 
virus, contact tracing, covid-19 party, covidiot, doomscrolling, Wuhan shake, 
coronoia, antimasker, social distance, SARS-CoV-2 


19 No Covid-related neologism was added to the printed Petit Robert in 2020 (i.e. to the 2021 edi- 
tion) but some words were added to the online version, which, for the first time, became out of 
sync with the paper version. See: https://orthogrenoble.net/mots-nouveaux-dictionnaires/entrees- 
petit-robert-2021/ (last access: 1 June 2021). 

20 Discarding previously detected words and upshifting words of lower ranks only results in the 
addition of coronapocalypse at the end of the list of the second trimester. 
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— TA: fever clinic, rat-licker, COVID-19, before times, China virus, Covidtide, Wuhan 
flu, coronasceptic, long-hauler, covidiot, Wuhan virus, long Covid, long covid 


In the French Wiktionary, the relevant words detected are the followings: 

— Tl: Covid-19, COVID-19, distanciation sociale, pneumonie de Wuhan 

- T2: Covid-19, déconfinement, covidiot, covid, covidé, méthode Raspail, info- 
démie, déconfiner, distanciation sociale, coronavirus 2 du syndrome respira- 
toire aigu sévêre, gel hydroalcoolique, COVID-19, distanciation physique, 
masque chirurgical 

- T3: doomscrolling, covid, démerdentiel, COVID-19, Covid-19, coronapiste, 
taux d'incidence “incidence rate”, coronavirussé, déconfiner 

— TA: Covid-19, covid, télétravaillable, reconfinement, syndémie, covidiot, dé- 
confinement, candidat-vaccin, cas contact, coronavirus 2 du syndrome respi- 
ratoire aigu sévere, infectivité “infectivity”, antivaccinisme “antivaccinism, 
opposition to vaccination”, télétravaillabilité “ability (of a work) to be done by 
teleworking”, antimasque, covidien, coronasceptique 


Some of the words detected are true formal neologisms (e.g. COVID-19, Wuhan flu). 
Conversely, numerous new headwords already existed before their addition to the 
Wiktionary list of headwords, as happens in commercial dictionaries. For example, 
case fatality rate has been mentioned by the Office québécois de la langue francaise 
in the terminological record taux de létalité of its Grand Dictionnaire Terminologique 
since 2009,* and probably had long been used by specialists of epidemiology and 
statistics before it was inventoried in the term bank. The sudden spread of the word 
in the general language, due to the — no less sudden - spread of the pandemic, mo- 
tivated the creation of the corresponding article. 

Reviewing the lists produced by varying the ranking scores and the data sour- 
ces is a good idea, as it retrieves headwords that the “globally better” configuration 
misses. For example, with 6 revisions performed by only 2 registered contributors in 
the third trimester, supercontaminateur ‘superspreader’ does not rank high enough 
to be detected by the contributors-based score but ranks 19% with the revisions- 
based score applied to the “no bots, no anonymous” dataset. 


3.4 Yearly ranking of the existing headwords 
The same experiments were conducted for existing headwords as for new head- 


words, by varying the source of data and the ranking scores. Only the main results 
are reproduced here, while others are summarised. 


21 http://gdt.oqlf.gouv.qc.ca/ficheOqlf.aspx?Id_Fiche=100408 (last access: 10 June 2022). 
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Just as for the analysis related to new headwords, discarding anonymous contri- 
butions slightly improved the results for the English dictionary (the “no bots, no 
anonymous” option was therefore chosen). As for the new headwords, the scores 
based on the number of contributors provide better rankings than those based on the 
number of revisions in the English Wiktionary, as shown in Figure 15. Regarding me- 
dian and mean values, using one or the other alternatively improves or lowers the 
results locally. The ratio using the average number of contributors reaches the best 
result, with 18% of the 100 best-ranked headwords being related to Covid: coronavi- 
rus, hydroxychloroquine, rona, lockdown, superspreader, surgical mask, pandemic, 
facemask, corona, herd immunity, MERS, MERS-CoV, Coronavirus, Zoom, ventilator, 
chloroquine, facial mask, SARS. The other ranking scores produce the same words 
(but fewer) in different orders. They provide only two extra words — respirator and 
syndemic - that are further down in the list (ranks 191 and 1353, respectively). 

In the French Wiktionary, considering or discarding anonymous contributions 
provides quantitatively similar results, with the set of relevant retrieved words differ- 
ing according to the score used. The contributors-based score performs better, but 
the difference with the score based on the revisions is less pronounced than when 
experimenting with the English Wiktionary. Overall, the proportion of relevant words 
identified in the 100 best-ranked words is lower in the French Wiktionary. The best 
configuration (ratio involving median values of contributors and no anonymous con- 
tributions), which reaches 9%, is half of that obtained in the English Wiktionary. 
This configuration made it possible to retrieve the words confinement, coronavirus, 
pandémie, chloroquine, cluster, télétravail ‘teleworking’, distanciation, contagiosité 
and présentiel ‘in-person activity’. Other configurations retrieved three additional 
words: SRAS ‘SARS’, quatorzaine ‘two weeks quarantine’ and corona. 
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Figure 15: Percentage of Covid-related existing headwords in the English Wiktionary, 
depending on the ranking scores. 
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3.5 Quarterly ranking of the existing headwords 


For the two languages, discarding the activity of bots improved the results but dis- 
carding anonymous contributions barely improved them. Scores based on the num- 
ber of contributors produced, again, better rankings than those based on revisions, 
which, sometimes, help detect few extra relevant words. The numbers of relevant 
existing headwords retrieved from the English and French Wiktionary, as a function 
of the number of candidates examined, are given in Table 3. 


Table 3: Number of identified existing words related to Covid-19, as a function of the number 
of candidates inspected. 


# candidates English Wiktionary French Wiktionary 

T1 T2 T3 T4 T1 T2 T3 TA 
5 5 5 0 2 2 1 1 0 
10 6 8 0 3 3 2 2 0 
25 8 9 2 5 4 4 2 1 
50 11 10 2 5 6 8 2 3 


100 14 15 3 7 7 9 3 5 


Once again, the method achieved better results with the English Wiktionary. For this 

language edition, the score based on the current/usual contributors ratio (using 

mean values) performed best. The true positives retrieved by this score applied each 
trimester to the *no bots, no anonymous" data source are: 

— Tl: coronavirus, pandemic, lockdown, corona, Coronavirus, quarantine, 
Wuhan, rona, herd immunity, panic buying, respirator, superspreader, 
MERS-CoV, ventilator 

— T2: coronavirus, corona, lockdown, facemask, rona, hydroxychloroquine, 
surgical mask, Corona, herd immunity, pulmonologist, pandemic, Zoom, 
superspreader, quar, MERS-CoV 

— T3: coronavirus, lockdown, intensive care unit 

— T4: hydroxychloroquine, coronavirus, superspreader, pandemic, virulent, 
chloroquine, rona 


Varying the data source (i.e. retaining anonymous contributions) retrieved the addi- 
tional severe acute respiratory syndrome (T1) and viral load (T3). Changing from 
mean to median (still with the contributors ratio) retrieved disinfection (T1) and cor- 
onary (T2) while switching to the revisions-based score added antigen (T1), pulmo- 
nology (T2), immunology (T3) and syndemic (T4). 
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The same ranking score (contributors ratio, mean value) applied to the same 
type of revisions (“no bots, no anonymous” option) retrieved the following words 
from the French Wiktionary: 

- Tt: coronavirus, confinement, chloroquine, grippe espagnole ‘Spanish in- 
fluenza”, pandémie, SRAS, cluster 

- T2: confinement, coronavirus, vidéoconférence, pangolin, masque ‘mask’, 
pandémie, corona, skypero “online apéritif using Skype”, quatorzaine 

— T3: coronavirus, présentiel, jauge ‘capacity’ 

— T&A: vaccin, antivaccin, vaccinal, distanciation, écouvillon ‘swab’ 


Changing from mean to median only added stop and go (T4), while using the revi- 
sions-based score additionally produced virus (T1), télétravail ‘teleworking’ (T2) and 
infectiosité ‘infectivity’ (T4). 

The better results observed for the English Wiktionary are related to the greater 
number of revisions/contributors for some articles. The variation observed for com- 
parable words (e.g. translational equivalents that may have comparable frequen- 
cies, degrees of polysemy and specialisation) can simply be due to the number of 
contributors on the lookout. Another explanation is the possible different degrees 
of completeness of the articles. For example, the existing headword comorbidity 
ranks high in the second trimester for having been revised several times, as shown 
in Figure 5(a) (Section 2.3). Although the article was quite up-to-date (the defini- 
tions were not modified in 2020), a usage example was replaced by another, more 
recent, one with an explicit reference to the coronavirus. A pronunciation, an alter- 
native form (co-morbidity), synonyms and related words were added, as well as nu- 
merous translations. In French, no alternative form exists and the pronunciation 
was already mentioned in the article before the pandemic. The only revision in 
2020 (addition of a recent citation) did not enable the word to be detected. 


3.6 False negatives 


All the experiments above demonstrated that the proposed method uncovers rele- 
vant neologisms, or indicates entries that possibly deserve a review. What the ex- 
periments do not say, however, is what the method missed. I therefore examined 
the ranking of the words included in the two OED updates dedicated to Covid-19 
vocabulary. The words added to the OED in April were generally at the top of the 
list of the headwords detected in Wiktionary, e.g. Covid-19, social distancing, flatten 
the curve, Covid, self-isolation, contact tracing, self-quarantine, self-isolate. Most of 
the words added in July were already in Wiktionary before 2020. Some of these ex- 
isting headwords rank quite high, e.g. corona, surgical mask, MERS, Zoom, dexa- 
methasone, comorbidity. Others rank much lower down, meaning that the articles 
received very few revisions, either because they were overlooked by contributors, or 
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because they were already up-to-date. For example, once Kawasaki’s is defined as 
‘synonym of Kawasaki disease”, with a hyperlink to the corresponding article, there 
is not very much left to say. The most noticeable words that the method failed to 
detect are shelter-in-place, added to Wiktionary as a run-on entry under shelter in 
2020,” and words whose ranking scores are very low, e.g. RO (dated March 2020), 
which received 5 revisions made by 3 contributors and frontliner, unmodified since 
its creation in April 2020. Other undetected words or words missing from the no- 
menclature are those having more popular derivatives (e.g. contact tracer ranks rel- 
atively low while contact tracing ranks high on the list) or equivalents (community 
transmission is absent from Wiktionary, but community spread ranks high). 

Undetected words are those having too few revisions or contributors. One could 
speculate that these words are too rare or too specialised to catch contributors” atten- 
tion. Another variable is Wiktionarians' idiosyncratic contribution patterns. Some con- 
tributors make numerous successive revisions as soon as they add a few words, which 
may result in a large number of revisions made by a single contributor (the coordinate 
points are those along the x-axis, on the right side of the scatterplots in Figure 9). Others 
contribute significant editions. For instance, the article aéroportage “air transportation' 
in the French Wiktionary contains two senses (“transport by air’ and “airborne spread of 
a disease”) with definitions and examples, a pronunciation, inflected forms, a synonym 
and a related word. The article was created in November 2020 in a one-shot edition and 
has not been modified since. Located at the (1, 1) coordinate point, it is undetectable. 
Future experiments will consist in taking into account the nature and length of contri- 
butions and possibly lead to a refinement of the method presented here. 

I investigated above the presence of the OED Covid-related headwords in Wik- 
tionary. A reverse question is: are Wiktionary’s most heavily modified Covid-related 
headwords in the OED? The top-ranked ones are, except the various stigmatizing 
appellations Wuhan virus, Wuhan pneumonia/flu and Chinese virus that were used 
before the virus and resulting disease were officially named Covid-19. Humorous 
coinages such as corona belly or the derogatory maskhole may not be good candi- 
dates for OED inclusion. Other words such as syndemic and, maybe, doomscrolling, 
could be considered. Regarding semantic neology, the 2020 revisions of antimask in 
Wiktionary could draw attention to the possible need to update the OED article 
which only describes the grotesque dance. 


4 Conclusion 


The present study was based on the hypothesis that Wiktionary’s most heavily mod- 
ified articles can help detect new and existing headwords that are related to topical 


22 Wiktionary's run-on entries were not taken into account in the present study. 
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events. Experimenting on the 2020 revisions and targeting Covid-related vocabulary 
proved successful and validated the hypothesis. One finding is that using only the 
number of unique contributors performs better than relying on the number of edi- 
tions. In other words, Wiktionary’s “crowd” of contributors is an asset for the task 
at hand. It does not mean that the number of revisions is not relevant. The conclu- 
sion to be drawn is rather that, given a set of articles having a similar number of 
revisions, the articles modified by the greatest diversity of contributors are the most 
likely to be related to topical events. Varying the ranking scores is also a good idea 
as it retrieves additional true positives. 

Using Wiktionary’s revision logs was considered a stopgap when no satisfactory 
diachronic corpus is available. When such a corpus exists, cross-checking the re- 
sults of corpus-driven analyses and Wiktionary’s history mining is certainly a good 
option. 

A strength of the proposed method is that it is language and topic independent. 
Regarding language, the method is likely to perform well with the editions of Wik- 
tionary that have the most active online communities. Regarding topics, one has to 
keep in mind that an event such as the Covid-19 pandemic is extraordinary, as were 
the two OED updates — and that unprecedented was the Oxford Languages word of 
the year 2020. Whether the suggested method is able to detect lexical innovations 
related to topical events that are less massive is an open question and the subject of 
future analyses. Trawling through the lists of candidates for the present study made 
me confident on that point. Other topics emerged, related for example to the US 
presidential election, identity and discrimination questions, police brutality — 2020 
was also the year of the killing of George Floyd that brought the (pre-existing) Black 
Lives Matter movement to the forefront, with the BLM acronym ranking high in Wik- 
tionary in the second trimester. Similar topics emerge from the French Wiktionary. In 
this language edition, a large number of feminine agent nouns were added. Though 
not precisely related to a timestamped event, and even though most of these nouns 
are feminized job titles related to forgotten professions, this trend is noteworthy. 

Wiktionary revision logs give the opportunity to predict the past. A possible as- 
sessment of the suggested method consists in reiterating the experiments on the revi- 
sions of previous years and analysing what vocabulary emerges, related to which 
topic. In the meantime, the current study led me to examine the revision rate of quar- 
antine, for which I observed a jump in 2020, and another back in 2009 (cf. Figure 16). 
Calculating the most frequently modified articles for that year enabled the detection of 
swine flu, which ranked first among the new articles (eclipsing the equally new HINT), 
while, regarding existing headwords, epidemic ranked 111" globally and 16™ in 
the second trimester; quarantine ranked 141% globally and 7' in the second tri- 
mester; and mask ranked 307º” globally and 142”* in the third trimester. Hopefully, the 
future will allow for the detection of more enjoyable neologisms to be included in the 
dictionary. The present is apparently not a time for complacency. In the French Wiktion- 
ary, vaxxie ‘a selfie taken while getting a Covid-19 vaccine”, centre de vaccination 
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‘vaccination centre”, Covid long “long Covid’, passeport vaccinal ‘vaccine pass’ and 
vaccino-sceptique “sceptical about the usefulness or the efficiency of vaccines” are 
among the most frequently modified neologisms during the first trimester of 2021 
(with respective ranks of 4, 9, 17, 63 and 118). Regarding existing words, vaccino- 
drome “large capacity vaccination centre” that was coined in 2020 and entered Wik- 
tionary in March 2020 was not revised enough to be detected that year, but ranks 
24th in the first semester of 2021, while couvre-feu ‘curfew’ ranks 15th in the second 
trimester. Which is a good point in favour of the suggested method, if not a light- 
hearted final note. 
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Figure 16: Neologisms and existing words showing notable increases in revisions in 2009. 
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